gknn: Generalized k-Nearest Neighbor Classification

Description

Compute the k-nearest neighbor classification given a matrix of cross-distances and a factor of class values. For each row the majority class is found, where ties are broken at random (default). If there are ties for the kth nearest neighbor, all candidates are included in the vote (default).

Usage

gknn(x, y, k = 1, l = 0, break.ties = TRUE, use.all = TRUE,
     prob = FALSE)

Value

Returns a factor of class values (of the rows of x) which may be

NA in the case of doubt (no definite decision), ties, or missing neighborhood information.

The proportions of winning votes are returned as attribute prob

(if option prob was used).

Arguments

x: a cross-distances matrix.
y: a factor of class values of the columns of x.
k: number of nearest neighbors to consider.
l: minimum number of votes for a definite decision.
break.ties: option to break ties.
use.all: option to consider all neighbors that are tied with the kth neighbor.
prob: optionally return proportions of winning votes.

Author

Christian Buchta

Details

The rows of the cross-distances matrix are interpreted as referencing the test samples and the columns as referencing the training samples.

The options are fashioned after knn in package class but are extended for tie breaking of votes, e.g. if only definite (majority) votes are of interest.

Missing class values are not allowed because that would collide with a missing classification result.

Missing distance values are ignored but with the possible consequence of missing classification results. Note that this depends on the options settings, e.g.

Examples

Run this code

if (FALSE) {
### extend Rock example
example("rockCluster")
k <- sample(nrow(x), 100)
y <- rf$cl[k]
levels(y)[3:4] <- 0
gk <- gknn(dist(x, x[k,], method="binary"), y, k=3)
attr(gk, "levels")[3] <- levels(rf$cl)[4]
table(cl = rf$cl, gk)
}

Run the code above in your browser using DataLab