Learn R Programming

dbscan (version 1.0-0)

kNN: Find the k Nearest Neighbors

Description

This function uses a kd-tree to find all k nearest neighbors in a data matrix (including distances) fast.

Usage

kNN(x, k, sort = TRUE, search = "kdtree", bucketSize = 10, splitRule = "suggest", approx = 0)

Arguments

x
a data matrix or a dist object.
k
number of neighbors to find.
search
nearest neighbor search strategy (one of "kdtree", "linear" or "dist").
sort
sort the neighbors by distance? Note that this is expensive and sort = FALSE is much faster.
bucketSize
max size of the kd-tree leafs.
splitRule
rule to split the kd-tree. One of "STD", "MIDPT", "FAIR", "SL_MIDPT", "SL_FAIR" or "SUGGEST" (SL stands for sliding). "SUGGEST" uses ANNs best guess.
approx
use approximate nearest neighbors. All NN up to a distance of a factor of 1+approx eps may be used. Some actual NN may be omitted leading to spurious clusters and noise points. However, the algorithm will enjoy a significant speedup.

Value

An object of class kNN containing a list with the following components:

Details

search controls if a kd-tree or linear search (both implemented in the ANN library; see Mount and Arya, 2010). Note, that these implementations cannot handle NAs. search="dist" precomputes Euclidean distances first using R. NAs are handled, but the resulting distance matrix cannot contain NAs. To use other distance measures, a precomputed distance matrix can be provided as x (search is ignored).

bucketSize and splitRule influence how the kd-tree is built. approx uses the approximate nearest neighbor search implemented in ANN. All nearest neighbors up to a distance of eps/(1+approx) will be considered and all with a distance greater than eps will not be considered. The other points might be considered. Note that this results in some actual nearest neighbors being omitted leading to spurious clusters and noise points. However, the algorithm will enjoy a significant speedup. For more details see Mount and Arya (2010).

Note: self-matches are removed!

References

David M. Mount and Sunil Arya (2010). ANN: A Library for Approximate Nearest Neighbor Searching, http://www.cs.umd.edu/~mount/ANN/.

See Also

frNN for fixed radius nearest neighbors.

Examples

Run this code
data(iris)
# finding kNN directly in data (using a kd-tree)
nn <- kNN(iris[,-5], k=5)
nn

# explore neighborhood of point 10
i <- 10
nn$id[i,]
plot(iris[,-5], col = ifelse(1:nrow(iris) %in% nn$id[i,], "red", "black"))

# show the k nearest neighbors as a graph.
plotNNgraph <- function(x, nn, main = "kNN graph", ...) {
  plot(x, main = main, ...)
  for(i in 1:nrow(nn$id)) {
    for(j in 1:length(nn$id[i,]))
      lines(x = c(x[i,1], x[nn$id[i,j],1]), y = c(x[i,2], x[nn$id[i,j],2]), ...)
  }
}

plotNNgraph(iris[, c(1,2)], nn)

Run the code above in your browser using DataLab