K Nearest Neighbors (KNN)
K Nearest Neighbors (KNN) is one of the most popular and intuitive supervised machine learning algorithms. It is available in Excel using the XLSTAT software.
What is K Nearest Neighbors (KNN) machine learning?
The K Nearest Neighbors method (KNN) aims to categorize query points whose class is unknown given their respective distances to points in a learning set (i.e. whose class is known a priori). It is one of the most popular supervised machine learning tools.
A simple version of KNN can be regarded as an extension of the nearest neighbor method (NN method is a special case of KNN, k = 1).
The KNN classification approach assumes that each example in the learning set is a random vector in Rn. Each point is described as x =< a1(x), a2(x), a3(x),.., an(x) > where ar(x) denotes the value I of the rth attribute. ar(x) can be either a quantitative or a qualitative variable.
To determine the class of the query point xq, each of the k nearest points x1,…,xk to xq proceed to voting. The class of xq corresponds to the majority class.
K Nearest Neighbors in XLSTAT: options
Distances: Several distance metrics can be used in XLSTAT to compute similarities in the K Nearest Neighbors algorithm. Options vary according to the type of variables characterizing the observations (qualitative or quantitative).
- Distances available for quantitative data (metrics): Euclidian, Minkowski, Manhatan, Tchebychev, Canberra
- Distances available for quantitative data (kernels): linear, sigmoid, logarithmic, power, Gaussian, Laplacian
- Distances available for qualitative data: Overlap Metric (OM), Value Difference Metric (VDM)
Validation: XLSTAT proposes a K-fold cross validation technique to quantify the quality of the classifier. Data is partitioned into k equally sub samples of equal size. Among the k subsamples, a single subsample is retained as the validation data to test the model, and the remaining k − 1 subsamples are used as training data.
Other options available in the XLSTAT K Nearest Neighbors feature include observation tracking as well as vote weighing.
K Nearest Neighbors in XLSTAT: results
The K Nearest Neighbors feature in XLSTAT includes displaying results by class or by object (observation).