Compute the k-nearest neighbor classification given a matrix of cross-distances and a factor of class values. For each row the majority class is found, where ties are broken at random (default). If there are ties for the kth nearest neighbor, all candidates are included in the vote (default).
gknn(x, y, k = 1, l = 0, break.ties = TRUE, use.all = TRUE,
prob = FALSE)
Returns a factor of class values (of the rows of x
) which may be
NA
in the case of doubt (no definite decision), ties, or missing
neighborhood information.
The proportions of winning votes are returned as attribute prob
(if option prob
was used).
a cross-distances matrix.
a factor of class values of the columns of x
.
number of nearest neighbors to consider.
minimum number of votes for a definite decision.
option to break ties.
option to consider all neighbors that are tied with the kth neighbor.
optionally return proportions of winning votes.
Christian Buchta
The rows of the cross-distances matrix are interpreted as referencing the test samples and the columns as referencing the training samples.
The options are fashioned after knn
in package class but are
extended for tie breaking of votes, e.g. if only definite (majority) votes
are of interest.
Missing class values are not allowed because that would collide with a missing classification result.
Missing distance values are ignored but with the possible consequence of missing classification results. Note that this depends on the options settings, e.g.
dist
for efficient computation of cross-distances.
if (FALSE) {
### extend Rock example
example("rockCluster")
k <- sample(nrow(x), 100)
y <- rf$cl[k]
levels(y)[3:4] <- 0
gk <- gknn(dist(x, x[k,], method="binary"), y, k=3)
attr(gk, "levels")[3] <- levels(rf$cl)[4]
table(cl = rf$cl, gk)
}
Run the code above in your browser using DataLab