> For the complete documentation index, see [llms.txt](https://jen-hsuan-hsieh.gitbook.io/python/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://jen-hsuan-hsieh.gitbook.io/python/chapter-2courses/21python-for-data-science-and-machine-learning-bootcamp/dsd/21142decision-trees-and-random-forests-with-python.md).

# 2.1.14.2.Decision trees and Random Forests with Python

## 1. 匯入基本的library

* [pandas](https://jenhsuan.gitbooks.io/python/content/chapter-2courses/21python-for-data-science-and-machine-learning-bootcamp/211jupyter-overview/214python-for-data-analysis-pandas.html), [numpy](https://jenhsuan.gitbooks.io/python/content/chapter-2courses/21python-for-data-science-and-machine-learning-bootcamp/211jupyter-overview/213python-for-data-analysis-numpy.html), [matplotlib](https://jenhsuan.gitbooks.io/python/content/test.html), [seaborn](https://jenhsuan.gitbooks.io/python/content/217python-for-data-visualization-seaborn.html)

  ```
    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    import seaborn as sns
  ```
* 將圖表直接嵌入到Notebook之中

  ```
    %matplotlib inline
  ```

## 2.讀取資料並了解資料

* 讀取資料
  * 由df.head可知變因有Age, Number, Start, 結果為Kyphosis

    ```
     df = pd.read_csv('kyphosis.csv')
     df.head()
    ```

    ![](/files/-M4M0TJ-4nQgBKRDkAIN)
* 先了解資料欄位的型別以及變數的型態, 由pd.info()可以知道這份資料有4個欄位: 有3筆屬於int64, 1筆屬於object

  ```
   df.info()
  ```

  ![](/files/-M4M0TJ1ftfEE2agnj9Y)
* 視覺化資料以了解每個因子間的相關性
  * [seaborn pairpot](https://jenhsuan.gitbooks.io/python/content/217python-for-data-visualization-seaborn/2172distribution-plot.html)

    ```
    sns.pairplot(data = df, hue='Kyphosis')
    ```

    ![](/files/-M4M0TJ3-JbQGgLMcdOz)

## 3.使用Skikit-learn library

* 首先介紹train\_test\_split, 這個函式可以隨機劃分訓練集和測試集

  ```
    from sklearn.cross_validation import train_test_split
    X = df_feat
    Y = df["TARGET CLASS"]
    X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.4, random_state=101)
  ```

## 4.使用Decision tree classifier

* import Decision tree classifier

  ```
    from sklearn.tree import DecisionTreeClassifier
    dtree = DecisionTreeClassifier()
  ```
* 訓練模型

  ```
   dtree.fit(X_train, y_train)
  ```
* 預測

  ```
    pred = dtree.predict(X_test)
  ```
* 評估模型的精度
  * confusion\_matrix, classification\_report

    ```
    from sklearn.metrics import classification_report, confusion_matrix
    print(classification_report(y_test, pred))
    print(confusion_matrix(y_test, pred))
    ```

    ![](/files/-M4M0TJ5s0zfvMfvVGh1)

## 5.使用Random tree forest

* import Random tree forest

  ```
    from sklearn.ensemble import RandomForestClassifier
  ```
* 訓練模型

  ```
    rfc = RandomForestClassifier(n_estimators = 200)
    rfc.fit(X_train, y_train)
  ```
* 預測

  ```
    rfc_pred = rfc.predict(X_test)
  ```
* 評估模型的精度

  ```
    rfc_pred = rfc.predict(X_test)
    print(classification_report(y_test, rfc_pred))
    print(confusion_matrix(y_test, rfc_pred))
  ```

  ![](/files/-M4M0TJ70-Hidwc7-A9J)

## 6.視覺化決策樹


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://jen-hsuan-hsieh.gitbook.io/python/chapter-2courses/21python-for-data-science-and-machine-learning-bootcamp/dsd/21142decision-trees-and-random-forests-with-python.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
