2.1.14.2.Decision trees and Random Forests with Python

1. 匯入基本的library

  import pandas as pd
  import numpy as np
  import matplotlib.pyplot as plt
  import seaborn as sns

將圖表直接嵌入到Notebook之中
```
  %matplotlib inline
```

2.讀取資料並了解資料

讀取資料
- 由df.head可知變因有Age, Number, Start, 結果為Kyphosis
  df = pd.read_csv('kyphosis.csv') df.head()
先了解資料欄位的型別以及變數的型態, 由pd.info()可以知道這份資料有4個欄位: 有3筆屬於int64, 1筆屬於object
```
 df.info()
```
視覺化資料以了解每個因子間的相關性
- seaborn pairpot
  sns.pairplot(data = df, hue='Kyphosis')

3.使用Skikit-learn library

首先介紹train_test_split, 這個函式可以隨機劃分訓練集和測試集

  from sklearn.cross_validation import train_test_split
  X = df_feat
  Y = df["TARGET CLASS"]
  X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.4, random_state=101)

4.使用Decision tree classifier

import Decision tree classifier

  from sklearn.tree import DecisionTreeClassifier
  dtree = DecisionTreeClassifier()

訓練模型
```
 dtree.fit(X_train, y_train)
```
預測
```
  pred = dtree.predict(X_test)
```

評估模型的精度

confusion_matrix, classification_report

from sklearn.metrics import classification_report, confusion_matrix
print(classification_report(y_test, pred))
print(confusion_matrix(y_test, pred))

5.使用Random tree forest

import Random tree forest

  from sklearn.ensemble import RandomForestClassifier

訓練模型

  rfc = RandomForestClassifier(n_estimators = 200)
  rfc.fit(X_train, y_train)

預測
```
  rfc_pred = rfc.predict(X_test)
```

評估模型的精度

  rfc_pred = rfc.predict(X_test)
  print(classification_report(y_test, rfc_pred))
  print(confusion_matrix(y_test, rfc_pred))

6.視覺化決策樹

Previous2.1.14.1.Introduction of tree methods Next2.1.15.Support Vector Machines

Last updated 5 years ago

Was this helpful?