Python
  • Introduction
  • Chapter 1.Notes from research
    • 1.Introduction of Python
    • 2. Build developer environment
      • 2.1.Sublime Text3
      • 2.2.Jupyter(IPython notebook)
        • 2.2.1.Introduction
        • 2.2.2.Basic usage
        • 2.2.3.some common operations
      • 2.3.Github
        • 2.3.1.Create Github account
        • 2.3.2.Create a new repository
        • 2.3.3.Basic operations: config, clone, push
      • 2.4.Install Python 3.4 in Windows
    • 3. Write Python code
      • 3.1.Hello Python
      • 3.2.Basic knowledges
      • 3.3.撰寫獨立python程式
      • 3.4.Arguments parser
      • 3.5.Class
      • 3.6.Sequence
    • 4. Web crawler
      • 4.1.Introduction
      • 4.2.requests
      • 4.3.beautifulSoup4
      • 3.4.a little web crawler
    • 5. Software testing
      • 5.1. Robot Framework
        • 1.1.Introduction
        • 1.2.What is test-automation framework?
        • 1.3.Robot Framework Architecture
        • 1.4.Robot Framework Library
        • 1.5.Reference
    • 6. encode/ decode
      • 6.1.編碼/解碼器的基本概念
      • 6.2.常見的編碼/ 解碼錯誤訊息與其意義
      • 6.3 .處理文字檔案
    • 7. module
      • 7.1.Write a module
      • 7.2.Common module
        • 7.2.1.sched
        • 7.2.2.threading
    • 8. Integrate IIS with django
      • 8.1.Integrate IIS with django
  • Chapter 2.Courses
    • 2.1.Python for Data Science and Machine Learning Bootcamp
      • 2.1.1.Virtual Environment
      • 2.1.2.Python crash course
      • 2.1.3.Python for Data Analysis - NumPy
        • 2.1.3.1.Numpy arrays
        • 2.1.3.2.Numpy Array Indexing
        • 2.1.3.3.Numpy Operations
      • 2.1.4.Python for Data Analysis - Pandas
        • 2.1.4.1.Introduction
        • 2.1.4.2.Series
        • 2.1.4.3.DataFrames
        • 2.1.4.4.Missing Data
        • 2.1.4.5.GroupBy
        • 2.1.4.6.Merging joining and Concatenating
        • 2.1.4.7.Data input and output
      • 2.1.5.Python for Data Visual Visualization - Pandas Built-in Data Visualization
      • 2.1.6.Python for Data Visualization - Matplotlib
        • 2.1.6.1.Introduction of Matplotlib
        • 2.1.6.2.Matplotlib
      • 2.1.7.Python for Data Visualization - Seaborn
        • 2.1.7.1.Introduction to Seaborn
        • 2.1.7.2.Distribution Plots
        • 2.1.7.3.Categorical Plots
        • 2.1.7.4.Matrix Plots
        • 2.1.7.5.Grids
        • 2.1.7.6.Regression Plots
      • 2.1.8. Python for Data Visualization - Plotly and Cufflinks
        • 2.1.8.1.Introduction to Plotly and Cufflinks
        • 2.1.8.2.Plotly and Cufflinks
      • 2.1.9. Python for Data Visualization - Geographical plotting
        • 2.1.9.1.Choropleth Maps - USA
        • 2.1.9.2.Choropleth Maps - World
      • 2.1.10.Combine data analysis and visualization to tackle real world data sets
        • 911 calls capstone project
      • 2.1.11.Linear regression
        • 2.1.11.1.Introduction to Scikit-learn
        • 2.1.11.2.Linear regression with Python
      • 2.1.12.Logistic regression
        • 2.1.12.1.Logistic regression Theory
        • 2.1.12.2.Logistic regression with Python
      • 2.1.13.K Nearest Neighbors
        • 2.1.13.1.KNN Theory
        • 2.1.13.2.KNN with Python
      • 2.1.14.Decision trees and random forests
        • 2.1.14.1.Introduction of tree methods
        • 2.1.14.2.Decision trees and Random Forests with Python
      • 2.1.15.Support Vector Machines
      • 2.1.16.K means clustering
      • 2.1.17.Principal Component Analysis
    • 2.2. Machine Learning Crash Course Jam
Powered by GitBook
On this page
  • 1.Introduction
  • 2.分割背後的直覺
  • 3.參考資料

Was this helpful?

  1. Chapter 2.Courses
  2. 2.1.Python for Data Science and Machine Learning Bootcamp
  3. 2.1.14.Decision trees and random forests

2.1.14.1.Introduction of tree methods

Previous2.1.14.Decision trees and random forestsNext2.1.14.2.Decision trees and Random Forests with Python

Last updated 5 years ago

Was this helpful?

1.Introduction

  • 會使用決策樹的例子

    • 有一位打球的朋友, 有時會出現而有時不會, 導致他不會出現的因素有很多, 例如天氣, 溫度, 濕度等

    • 於是我們想找出這些因素與最後對方是否會出現

    • 利用這些因素的資料來預測對方是否會出現, 方法之一就是使用決策樹

  • 決策樹

    • Nodes

      • 因為某Attribute的值而分割 (split)

    • Edges

      • Next node的輸出值

    • Root

      • 第一個split的node

    • Leaves

      • 做預測的Terminal node

2.分割背後的直覺

  • Best split

    • 在資料中最能夠左右結果的因子

  • Entropy, information gain

    • 可以協助找出best split的數學指標

  • Random forest

    • Decision tree的缺點是無法趨向最佳的預測精度 (predictive accuracy)

    • 使用一群隨機在每個split所找出的tree, 特性是變異度高

    • Random force

      • 避免單一具領導性的feature造成樹之間的高度相關性

3.參考資料

  • Refer to Chapter 8. Introduction to Statistical Learning

是一個常用來降低機器學習算法的變異數的方式, 通常是在所有的隨機樹(p)中挑選m顆

Bagging