# K Nearest Neighbors (KNN)

K Nearest Neighbors (KNN) is one of the most popular and intuitive supervised machine learning algorithms. It is available in Excel using the XLSTAT software.

## What is K Nearest Neighbors (KNN) machine learning?

The **K Nearest Neighbors method** (**KNN**) aims to categorize query points whose class is unknown given their respective distances to points in a learning set (i.e. whose class is known a priori). It is one of the most popular supervised **machine learning** tools.

A simple version of **KNN** can be regarded as an extension of the nearest neighbor method (NN method is a special case of KNN, k = 1).

The KNN classification approach assumes that each example in the learning set is a random vector in Rn. Each point is described as x =< a1(x), a2(x), a3(x),.., an(x) > where ar(x) denotes the value I of the rth attribute. ar(x) can be either a quantitative or a qualitative variable.

To determine the class of the query point xq, each of the k nearest points x1,…,xk to xq proceed to voting. The class of xq corresponds to the majority class.

## K Nearest Neighbors in XLSTAT: options

**Distances**: Several distance metrics can be used in XLSTAT to compute similarities in the **K Nearest Neighbors** algorithm. Options vary according to the type of variables characterizing the observations (qualitative or quantitative).

- Distances available for quantitative data (metrics): Euclidian, Minkowski, Manhatan, Tchebychev, Canberra
- Distances available for quantitative data (kernels): linear, sigmoid, logarithmic, power, Gaussian, Laplacian
- Distances available for qualitative data: Overlap Metric (OM), Value Difference Metric (VDM)

**Validation**: XLSTAT proposes a **K-fold cross validation** technique to quantify the quality of the classifier. Data is partitioned into k equally sub samples of equal size. Among the *k* subsamples, a single subsample is retained as the validation data to test the model, and the remaining *k* − 1 subsamples are used as training data.

Other options available in the XLSTAT **K Nearest Neighbors** feature include **observation tracking** as well as **vote weighing**.

## K Nearest Neighbors in XLSTAT: results

The **K Nearest Neighbors** feature in XLSTAT includes displaying results by class or by object (observation).