gknn: Generalized k-Nearest Neighbors Classification or Regression

Description

gknn is an implementation of the k-nearest neighbours algorithm making use of general distance measures. A formula interface is provided.

Usage

# S3 method for formula
gknn(formula, data = NULL, ..., subset, na.action = na.pass, scale = TRUE)
# S3 method for default
gknn(x, y, k = 1, method = NULL, 
                       scale = TRUE, use_all = TRUE, 
                       FUN = mean, ...)
# S3 method for gknn
predict(object, newdata, 
                         type = c("class", "votes", "prob"), 
                         ...,
                         na.action = na.pass)

Value

For gknn(), an object of class "gknn" containing the data and the specified parameters. For predict.gknn(), a vector of predictions, or a matrix with votes for all classes. In case of an overall class tie, the predicted class is chosen by random.

Arguments

formula: a symbolic description of the model to be fit.
data: an optional data frame containing the variables in the model. By default the variables are taken from the environment which ‘gknn’ is called from.
x: a data matrix.
y: a response vector with one label for each row/component of x. Can be either a factor (for classification tasks) or a numeric vector (for regression).
k: number of neighbours considered.
scale: a logical vector indicating the variables to be scaled. If scale is of length 1, the value is recycled as many times as needed. By default, numeric matrices are scaled to zero mean and unit variance. The center and scale values are returned and used for later predictions. Note that the default metric for data frames is the Gower metric which standardizes the values to the unit interval.
method: Argument passed to dist() from the proxy package to select the distance metric used: a function, or a mnemonic string referencing the distance measure. Defaults to "Euclidean" for metric matrices, to "Jaccard" for logical matrices and to "Gower" for data frames.
use_all: controls handling of ties. If true, all distances equal to the kth largest are included. If false, a random selection of distances equal to the kth is chosen to use exactly k neighbours.
FUN: function used to aggregate the k nearest target values in case of regression.
object: object of class gknn.
newdata: matrix or data frame with new instances.
type: character specifying the return type in case of class predictions: for "class", the class labels; for "prob", the class distribution for all k neighbours considered; for "votes", the raw counts.
...: additional parameters passed to dist()
subset: An index vector specifying the cases to be used in the training sample. (NOTE: If given, this argument must be named.)
na.action: A function to specify the action to be taken if NAs are found. The default action is na.pass. (NOTE: If given, this argument must be named.)

Author

David Meyer (David.Meyer@R-project.org)

Examples

Run this code

data(iris)

model <- gknn(Species ~ ., data = iris)
predict(model, iris[c(1, 51, 101),])

test = c(45:50, 95:100, 145:150)

model <- gknn(Species ~ ., data = iris[-test,], k = 3, method = "Manhattan")
predict(model, iris[test,], type = "votes")

model <- gknn(Species ~ ., data = iris[-test], k = 3, method = "Manhattan")
predict(model, iris[test,], type = "prob")

Run the code above in your browser using DataLab

Last chance! 50% off unlimited learning