scrime (version 1.3.5)

gknn: Generalized k Nearest Neighbors

Description

Predicts the classes of new observations with \(k\) Nearest Neighbors based on an user-specified distance measure.

Usage

gknn(data, cl, newdata, nn = 5, distance = NULL, use.weights = FALSE, ...)

Arguments

data

a numeric matrix in which each row represents an observation and each column a variable. If distance is "smc", "cohen" or "pcc", the values in data must be integers between 1 and \(n_{cat}\), where \(n_{cat}\) is the maximum number of levels one of the variables can take. Missing values are allowed.

cl

a numeric vector of length nrow(data) giving the class labels of the observations represented by the rows of data. cl must consist of integers between 1 and \(n_{cl}\), where \(n_{cl}\) is the number of groups.

newdata

a numeric matrix in which each row represents a new observation for which the class label should be predicted and each column consists of the same variable as the corresponding column of data.

nn

an integer specifying the number of nearest neighbors used to classify the new observations.

distance

character vector naming the distance measure used to identify the nn nearest neighbors. Must be one of "smc", "cohen", "pcc", "euclidean", "maximum", "manhattan", "canberra", and "minkowski". If NULL, it is determined in an ad hoc way if the data seems to be categorical. If this is the case distance is set to "smc". Otherwise, it is set to "euclidean".

use.weights

should the votes of the nearest neighbors be weighted by the reciprocal of the distances to the new observation when the class of a new observation should be predicted?

further arguments for the distance measure. If, e.g., distance = "minkowski", then p can also be specified, see dist. If distance = "pcc", then version can also be specified, see pcc.

Value

The predicted classes of the new observations.

References

Schwender, H.\ (2007). Statistical Analysis of Genotype and Gene Expression Data. Dissertation, Department of Statistics, University of Dortmund.

See Also

knncatimpute, smc, pcc

Examples

Run this code
# NOT RUN {
# Using the example from the function knn.

library(class)
data(iris3)
train <- rbind(iris3[1:25,,1], iris3[1:25,,2], iris3[1:25,,3])
test <- rbind(iris3[26:50,,1], iris3[26:50,,2], iris3[26:50,,3])
cl <- c(rep(2, 25), rep(1, 25), rep(1, 25))

knn.out <- knn(train, test, as.factor(cl), k = 3, use.all = FALSE)
gknn.out <- gknn(train, cl, test, nn = 3)

# Both applications lead to the same predictions.

knn.out == gknn.out

# But gknn allows to use other distance measures than the Euclidean 
# distance. E.g., the Manhattan distance.

gknn(train, cl, test, nn = 3, distance = "manhattan")

# }

Run the code above in your browser using DataLab