2.1.13.2.KNN with Python
1. 匯入基本的library
pandas, numpy, matplotlib, seaborn
import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns
將圖表直接嵌入到Notebook之中
%matplotlib inline
2.讀取資料並了解資料
df = pd.read_csv('Classified Data', index_col = 0)
df.head()

3.標準化資料
變數的scale通常對結果有很大的影響, 當使用KNN classifier時通常會統一observation的尺度
使用StandardScaler
StandardScaler的作用是將數據減去平均值並除以方差, 公式為(X-mean)/std
將原始資料的TARGET CLASS drop掉後fit, 再transform
from sklearn.preprocessing import StandardScaler scaler = StandardScaler() scaler.fit(df.drop("TARGET CLASS", axis = 1)) scaled_features = scaler.transform(df.drop("TARGET CLASS", axis = 1))
將標準化後的資料準轉成DataFrame
df_feat = pd.DataFrame(scaled_features, columns = df.columns[:-1]) df_feat.head()
4.使用Skikit-learn library
首先介紹train_test_split, 這個函式可以隨機劃分訓練集和測試集
from sklearn.cross_validation import train_test_split X = df_feat Y = df["TARGET CLASS"] X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.4, random_state=101)
5.使用KNN classifier進行預測
設定n_neighbors(K)為1
from sklearn.neighbors import KNeighborsClassifier knn = KNeighborsClassifier(n_neighbors = 1) knn.fit(X_train, y_train) predictions = knn.predict(X_test)
6.評估模型的精度
使用classification_report
from sklearn.metrics import classification_report print(classification_report(y_test, predictions))
confusion_matrix
from sklearn.metrics import confusion_matrix print(confusion_matrix(y_test, predictions))
7.調整K值
error_rate = [] for i in range(1, 40): knn = KNeighborsClassifier(n_neighbors = i) knn.fit(X_train, y_train) pred_i = knn.predict(X_test) error_rate.append(np.mean(pred_i != y_test)) plt.figure(figsize = (10, 6)) plt.plot(range(1,40), error_rate, color = 'blue',linestyle = 'dashed', marker = 'o', markerfacecolor = 'red' , markersize = 10) plt.title("Error Rate vs K value") plt.xlabel('K')
重新預估k值
knn = KNeighborsClassifier(n_neighbors = 37) knn.fit(X_train, y_train) predictions = knn.predict(X_test) print(classification_report(y_test, predictions)) print(confusion_matrix(y_test, predictions))
Last updated
Was this helpful?