kNN
is used to perform k-nearest neighbour classification for test set using training set. For each row of the test set, the k
nearest (based on Euclidean distance) training set vectors are found. then, the classification is done by majority vote (ties broken at random). This function provides a formula interface to the class::knn()
function of R
package class
. In addition, it allows normalization of the given data using the scaler
function.
kNN(formula, train, test, k = 1, scaler = FALSE, type = "class", l = 0,
use.all = TRUE, na.rm = FALSE)
When type = "class"
(default), a factor vector is returned,
in which the doubt
will be returned as NA
.
When type = "prob"
, a matrix of confidence values is returned
(one column per class).
a formula, with a response but no interaction terms. For the case of data frame, it is taken as the model frame (see model.frame)
.
data frame or matrix of train set cases.
data frame or matrix of test set cases.
number of neighbours considered.
a character with options FALSE
(default), "minmax"
, and "zscore"
.
Option "minmax"
means no transformation. This option allows the users to use normalized version of the train and test sets for the kNN aglorithm.
either "class"
(default) for the predicted class or "prob"
for model confidence values.
minimum vote for definite decision, otherwise doubt
. (More precisely, less than k-l
dissenting votes are allowed, even if k
is increased by ties.)
controls handling of ties. If true, all distances equal to the k
th largest are included. If false, a random selection of distances equal to the k
th is chosen to use exactly k
neighbours.
a logical value indicating whether NA values in x
should be stripped before the computation proceeds.
Reza Mohammadi a.mohammadi@uva.nl and Kevin Burke kevin.burke@ul.ie
Ripley, B. D. (1996) Pattern Recognition and Neural Networks. Cambridge.
Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth edition. Springer.
kNN
, scaler
data(risk)
train = risk[1:100, ]
test = risk[ 101, ]
kNN(risk ~ income + age, train = train, test = test)
Run the code above in your browser using DataLab