2.1.14.2.Decision trees and Random Forests with Python
1. 匯入基本的library
pandas, numpy, matplotlib, seaborn
import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns
將圖表直接嵌入到Notebook之中
%matplotlib inline
2.讀取資料並了解資料
讀取資料
由df.head可知變因有Age, Number, Start, 結果為Kyphosis
df = pd.read_csv('kyphosis.csv') df.head()
先了解資料欄位的型別以及變數的型態, 由pd.info()可以知道這份資料有4個欄位: 有3筆屬於int64, 1筆屬於object
df.info()
視覺化資料以了解每個因子間的相關性
3.使用Skikit-learn library
首先介紹train_test_split, 這個函式可以隨機劃分訓練集和測試集
from sklearn.cross_validation import train_test_split X = df_feat Y = df["TARGET CLASS"] X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.4, random_state=101)
4.使用Decision tree classifier
import Decision tree classifier
from sklearn.tree import DecisionTreeClassifier dtree = DecisionTreeClassifier()
訓練模型
dtree.fit(X_train, y_train)
預測
pred = dtree.predict(X_test)
評估模型的精度
confusion_matrix, classification_report
from sklearn.metrics import classification_report, confusion_matrix print(classification_report(y_test, pred)) print(confusion_matrix(y_test, pred))
5.使用Random tree forest
import Random tree forest
from sklearn.ensemble import RandomForestClassifier
訓練模型
rfc = RandomForestClassifier(n_estimators = 200) rfc.fit(X_train, y_train)
預測
rfc_pred = rfc.predict(X_test)
評估模型的精度
rfc_pred = rfc.predict(X_test) print(classification_report(y_test, rfc_pred)) print(confusion_matrix(y_test, rfc_pred))
6.視覺化決策樹
Last updated
Was this helpful?