192x Filetype PDF File size 0.25 MB Source: intellipaat.com
PYTHON FOR DATA Working On Model SCIENCE Model Choosing T r a i n - T e s t Data CHEAT SHEET Supervised Learning Estimator: • Naive Bayes: Unsupervised Learning Estimator: • Linear Regression: >>> from sklearn.naive_bayesimport • Principal Component Analysis (PCA): Supervised: >>>from sklearn.linear_modelimport GaussianNB >>> from sklearn.decomposition import >>>new_ lr.fit(X, y) LinearRegression >>> new_gnb= GaussianNB() PCA >>> knn.fit(X_train, y_train) Python Scikit-Learn >>> new_lr = • KNN: >>> new_pca= PCA(n_components=0.95) >>>new_svc.fit(X_train, y_train) LinearRegression(normalize=True) >>> from sklearnimport neighbors • K Means: Unsupervised : • Support Vector Machine: >>> >>>from sklearn.cluster import KMeans >>> k_means.fit(X_train) >>> from sklearn.svmimport SVC knn=neighbors.KNeighborsClassifier(n_ne >>> k_means= KMeans(n_clusters=5, >>> pca_model_fit= >>> new_svc= SVC(kernel='linear') ighbors=1) random_state=0) new_pca.fit_transform(X_train) I n t r o d u c t i o n Scikit-learn:“sklearn" is a machine learning library for the Python programming language. Simple and efficient tool for data mining, Data analysis and Machine Learning. P o s t - P r o c e s s i n g Importing Convention -import sklearn P r e p r o c e s s i n g P r e d i c t i o n Model Tuning Supervised: Grid Search: Randomized Parameter Optimization: Data Loading T r a i n - T e s t >>>y_predict= >>> from sklearn.grid_searchimport GridSearchCV >>> from sklearn.grid_searchimport RandomizedSearchCV new_svc.predict(np.random.random((3,5))) >>> params= {"n_neighbors": np.arange(1,3), "metric": >>> params= {"n_neighbors": range(1,5), "weights": • Using NumPy: Data >>>y_predict= new_lr.predict(X_test) ["euclidean", "cityblock"]} ["uniform", "distance"]} >>>import numpyas np >>>y_predict= knn.predict_proba(X_test) >>> grid = GridSearchCV(estimator=knn, >>> rsearch = RandomizedSearchCV(estimator=knn, >>>a=np.array([(1,2,3,4),(7,8,9,10)],dtype=int) param_grid=params) param_distributions=params, cv=4, n_iter=8, random_state=5) >>>data = np.loadtxt('file_name.csv', >>>from sklearn.model_selection Unsupervised: >>> grid.fit(X_train, y_train) >>> rsearch.fit(X_train, y_train) delimiter=',') import train_test_split >>>y_pred= k_means.predict(X_test) >>> print(grid.best_score_) >>> print(rsearch.best_score_) >>> print(grid.best_estimator_.n_neighbors) • Using Pandas: >>>import pandas as pd >>> X_train, X_test, y_train, y_test = >>>df=pd.read_csvȋǮfile_name.csvǯ,header=0) train_test_split(X,y,random_state=0) E v a l u a t e P e r f o r m a n c e Classification: Regression: Clustering: Cross-validation: Data Preparation 1. Confusion Matrix: 1. Mean Absolute Error: 1. Homogeneity: >>> from >>> from sklearn.metricsimport >>> from sklearn.metricsimport mean_absolute_error >>> from sklearn.metricsimport sklearn.cross_validation confusion_matrix homogeneity_score import cross_val_score • Standardization • Normalization >>> print(confusion_matrix(y_test, >>> y_true= [3, -0.5, 2] >>> homogeneity_score(y_true, >>> >>>from sklearn.preprocessing import >>>from sklearn.preprocessing import y_pred)) >>> mean_absolute_error(y_true, y_predict) y_predict) print(cross_val_score(knn, StandardScaler Normalizer 2. Accuracy Score: 2. Mean Squared Error: 2. V-measure: X_train, y_train, cv=4)) >>>get_names= df.columns >>>pd.read_csv("File_name.csv") >>> knn.score(X_test, y_test) >>> from sklearn.metricsimport mean_squared_error >>> from sklearn.metricsimport >>> >>>scaler = >>>x_array= np.array(df[ǮColumn1ǯ]Ȍ >>> from sklearn.metricsimport >>> mean_squared_error(y_test, y_predict) v_measure_score print(cross_val_score(new_ preprocessing.StandardScaler() #Normalize Column1 accuracy_score 3. R² Score : >>> metrics.v_measure_score(y_true, lr, X, y, cv=2)) >>>scaled_df= scaler.fit_transform(df) >>>normalized_X= >>> accuracy_score(y_test, y_pred) >>> from sklearn.metricsimport r2_score y_predict) >>>scaled_df= preprocessing.normalize([x_array]) >>> r2_score(y_true, y_predict) pd.DataFrame(scaled_df, columns=get_names)m FURTHERMORE: Python for Data Science Certification Training Course
no reviews yet
Please Login to review.