2.1.13.1.KNN Theory

1.K Nearest Neighbor

  • 找和新數據最近的K個鄰居, 這些鄰居是什麼分類, 那麼新數據就是什麼樣的分類 (Choosing a K will affect what class a new point is assigned to)

  • Training algorithm

    • Store all the data

  • Prediction algorithm

    1. Calculate the distance from x to all points in your data

    2. Sort the points in your data by increasing distance from x

    3. Predict the majority label of the "k" closet points

2.Pros and cons

  • Pros

    1. Very simple

    2. Training is trivial

    3. Works with any number of classes

    4. Easy to add more data

    5. Few parameters

      • K

      • Distance metric

  • Cons

    1. High prediction cost (worse for large data sets)

    2. Not good with high dimensional data

    3. Categorical features do not work well

Last updated

Was this helpful?