kNN: k-Nearest Neighbour Classification

Description

This function provides a formula interface to the existing knn() function of package class. On top of this type of convinient interface, the function also allows normalization of the given data.

Usage

kNN(form, train, test, norm = T, norm.stats = NULL, ...)

Arguments

form

An object of the class formula describing the functional form of the classification model.

train

The data to be used as training set.

test

The data set for which we want to obtain the k-NN classification, i.e. the test set.

norm

A boolean indicating whether the training data should be previously normalized before obtaining the k-NN predictions (defaults to TRUE).

norm.stats

This argument allows the user to supply the centrality and spread statistics that will drive the normalization. If not supplied they will default to the statistics used in the function scale(). If supplied they should be a list with two components, each beig a vector with as many positions as there are columns in the data set. The first vector should contain the centrality statistics for each column, while the second vector should contain the spread statistc values.

...

Any other parameters that will be forward to the knn() function of package class.

Value

The return value is the same as in the knn() function of package class. This is a factor of classifications of the test set cases.

Details

This function is essentially a convenience function that provides a formula-based interface to the already existing knn() function of package class. On top of this type of interface it also incorporates some facilities in terms of normalization of the data before the k-nearest neighbour classification algorithm is applied. This algorithm is based on the distances between observations, which are known to be very sensitive to different scales of the variables and thus the usefulness of normalization.

References

Torgo, L. (2010) Data Mining using R: learning with case studies, CRC Press (ISBN: 9781439810187).

http://www.dcc.fc.up.pt/~ltorgo/DataMiningWithR

Examples

Run this code

## A small example with the IRIS data set
data(iris)

## Split in train + test set
idxs <- sample(1:nrow(iris),as.integer(0.7*nrow(iris)))
trainIris <- iris[idxs,]
testIris <- iris[-idxs,]

## A 3-nearest neighbours model with no normalization
nn3 <- kNN(Species ~ .,trainIris,testIris,norm=FALSE,k=3)

## The resulting confusion matrix
table(testIris[,'Species'],nn3)

## Now a 5-nearest neighbours model with normalization
nn5 <- kNN(Species ~ .,trainIris,testIris,norm=TRUE,k=5)

## The resulting confusion matrix
table(testIris[,'Species'],nn5)

Run the code above in your browser using DataLab