Python
  • Introduction
  • Chapter 1.Notes from research
    • 1.Introduction of Python
    • 2. Build developer environment
      • 2.1.Sublime Text3
      • 2.2.Jupyter(IPython notebook)
        • 2.2.1.Introduction
        • 2.2.2.Basic usage
        • 2.2.3.some common operations
      • 2.3.Github
        • 2.3.1.Create Github account
        • 2.3.2.Create a new repository
        • 2.3.3.Basic operations: config, clone, push
      • 2.4.Install Python 3.4 in Windows
    • 3. Write Python code
      • 3.1.Hello Python
      • 3.2.Basic knowledges
      • 3.3.撰寫獨立python程式
      • 3.4.Arguments parser
      • 3.5.Class
      • 3.6.Sequence
    • 4. Web crawler
      • 4.1.Introduction
      • 4.2.requests
      • 4.3.beautifulSoup4
      • 3.4.a little web crawler
    • 5. Software testing
      • 5.1. Robot Framework
        • 1.1.Introduction
        • 1.2.What is test-automation framework?
        • 1.3.Robot Framework Architecture
        • 1.4.Robot Framework Library
        • 1.5.Reference
    • 6. encode/ decode
      • 6.1.編碼/解碼器的基本概念
      • 6.2.常見的編碼/ 解碼錯誤訊息與其意義
      • 6.3 .處理文字檔案
    • 7. module
      • 7.1.Write a module
      • 7.2.Common module
        • 7.2.1.sched
        • 7.2.2.threading
    • 8. Integrate IIS with django
      • 8.1.Integrate IIS with django
  • Chapter 2.Courses
    • 2.1.Python for Data Science and Machine Learning Bootcamp
      • 2.1.1.Virtual Environment
      • 2.1.2.Python crash course
      • 2.1.3.Python for Data Analysis - NumPy
        • 2.1.3.1.Numpy arrays
        • 2.1.3.2.Numpy Array Indexing
        • 2.1.3.3.Numpy Operations
      • 2.1.4.Python for Data Analysis - Pandas
        • 2.1.4.1.Introduction
        • 2.1.4.2.Series
        • 2.1.4.3.DataFrames
        • 2.1.4.4.Missing Data
        • 2.1.4.5.GroupBy
        • 2.1.4.6.Merging joining and Concatenating
        • 2.1.4.7.Data input and output
      • 2.1.5.Python for Data Visual Visualization - Pandas Built-in Data Visualization
      • 2.1.6.Python for Data Visualization - Matplotlib
        • 2.1.6.1.Introduction of Matplotlib
        • 2.1.6.2.Matplotlib
      • 2.1.7.Python for Data Visualization - Seaborn
        • 2.1.7.1.Introduction to Seaborn
        • 2.1.7.2.Distribution Plots
        • 2.1.7.3.Categorical Plots
        • 2.1.7.4.Matrix Plots
        • 2.1.7.5.Grids
        • 2.1.7.6.Regression Plots
      • 2.1.8. Python for Data Visualization - Plotly and Cufflinks
        • 2.1.8.1.Introduction to Plotly and Cufflinks
        • 2.1.8.2.Plotly and Cufflinks
      • 2.1.9. Python for Data Visualization - Geographical plotting
        • 2.1.9.1.Choropleth Maps - USA
        • 2.1.9.2.Choropleth Maps - World
      • 2.1.10.Combine data analysis and visualization to tackle real world data sets
        • 911 calls capstone project
      • 2.1.11.Linear regression
        • 2.1.11.1.Introduction to Scikit-learn
        • 2.1.11.2.Linear regression with Python
      • 2.1.12.Logistic regression
        • 2.1.12.1.Logistic regression Theory
        • 2.1.12.2.Logistic regression with Python
      • 2.1.13.K Nearest Neighbors
        • 2.1.13.1.KNN Theory
        • 2.1.13.2.KNN with Python
      • 2.1.14.Decision trees and random forests
        • 2.1.14.1.Introduction of tree methods
        • 2.1.14.2.Decision trees and Random Forests with Python
      • 2.1.15.Support Vector Machines
      • 2.1.16.K means clustering
      • 2.1.17.Principal Component Analysis
    • 2.2. Machine Learning Crash Course Jam
Powered by GitBook
On this page
  • 1. 匯入基本的library
  • 2.讀取資料並了解資料
  • 3.使用Skikit-learn library
  • 4.使用Decision tree classifier
  • 5.使用Random tree forest
  • 6.視覺化決策樹

Was this helpful?

  1. Chapter 2.Courses
  2. 2.1.Python for Data Science and Machine Learning Bootcamp
  3. 2.1.14.Decision trees and random forests

2.1.14.2.Decision trees and Random Forests with Python

Previous2.1.14.1.Introduction of tree methodsNext2.1.15.Support Vector Machines

Last updated 5 years ago

Was this helpful?

1. 匯入基本的library

  • , , ,

      import pandas as pd
      import numpy as np
      import matplotlib.pyplot as plt
      import seaborn as sns
  • 將圖表直接嵌入到Notebook之中

      %matplotlib inline

2.讀取資料並了解資料

  • 讀取資料

    • 由df.head可知變因有Age, Number, Start, 結果為Kyphosis

       df = pd.read_csv('kyphosis.csv')
       df.head()

  • 先了解資料欄位的型別以及變數的型態, 由pd.info()可以知道這份資料有4個欄位: 有3筆屬於int64, 1筆屬於object

     df.info()

  • 視覺化資料以了解每個因子間的相關性

    • sns.pairplot(data = df, hue='Kyphosis')

3.使用Skikit-learn library

  • 首先介紹train_test_split, 這個函式可以隨機劃分訓練集和測試集

      from sklearn.cross_validation import train_test_split
      X = df_feat
      Y = df["TARGET CLASS"]
      X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.4, random_state=101)

4.使用Decision tree classifier

  • import Decision tree classifier

      from sklearn.tree import DecisionTreeClassifier
      dtree = DecisionTreeClassifier()
  • 訓練模型

     dtree.fit(X_train, y_train)
  • 預測

      pred = dtree.predict(X_test)
  • 評估模型的精度

    • confusion_matrix, classification_report

      from sklearn.metrics import classification_report, confusion_matrix
      print(classification_report(y_test, pred))
      print(confusion_matrix(y_test, pred))

5.使用Random tree forest

  • import Random tree forest

      from sklearn.ensemble import RandomForestClassifier
  • 訓練模型

      rfc = RandomForestClassifier(n_estimators = 200)
      rfc.fit(X_train, y_train)
  • 預測

      rfc_pred = rfc.predict(X_test)
  • 評估模型的精度

      rfc_pred = rfc.predict(X_test)
      print(classification_report(y_test, rfc_pred))
      print(confusion_matrix(y_test, rfc_pred))

6.視覺化決策樹

pandas
numpy
matplotlib
seaborn
seaborn pairpot