K Nearest Neighbors (KNN)

K Nearest Neighbors (KNN) is one of the most popular and intuitive supervised machine learning algorithms. It is available in Excel using the XLSTAT software.

What is K Nearest Neighbors (KNN) machine learning?

The K Nearest Neighbors method (KNN) aims to categorize query points whose class is unknown given their respective distances to points in a learning set (i.e. whose class is known a priori). It is one of the most popular supervised machine learning tools.

A simple version of KNN can be regarded as an extension of the nearest neighbor method (NN method is a special case of KNN, k = 1).

The KNN classification approach assumes that each example in the learning set is a random vector in Rn. Each point is described as x =< a1(x), a2(x), a3(x),.., an(x) > where ar(x) denotes the value I of the rth attribute. ar(x) can be either a quantitative or a qualitative variable.

To determine the class of the query point xq, each of the k nearest points x1,…,xk to xq proceed to voting. The class of xq corresponds to the majority class.

K Nearest Neighbors in XLSTAT: options

Distances: Several distance metrics can be used in XLSTAT to compute similarities in the K Nearest Neighbors algorithm. Options vary according to the type of variables characterizing the observations (qualitative or quantitative).

  • Distances available for quantitative data (metrics): Euclidian, Minkowski, Manhatan, Tchebychev, Canberra 
  • Distances available for quantitative data (kernels): linear, sigmoid, logarithmic, power, Gaussian, Laplacian
  • Distances available for qualitative data: Overlap Metric (OM), Value Difference Metric (VDM)

Validation: XLSTAT proposes a K-fold cross validation technique to quantify the quality of the classifier. Data is partitioned into k equally sub samples of equal size. Among the k subsamples, a single subsample is retained as the validation data to test the model, and the remaining k − 1 subsamples are used as training data.

Other options available in the XLSTAT K Nearest Neighbors feature include observation tracking as well as vote weighing.

K Nearest Neighbors in XLSTAT: results

The K Nearest Neighbors feature in XLSTAT includes displaying results by class or by object (observation).