Fast k-nearest neighbor searching algorithms including a kd-tree, cover-tree and the algorithm implemented in class package.
get.knn(data, k=10, algorithm=c("kd_tree", "cover_tree", "CR", "brute"))
get.knnx(data, query, k=10, algorithm=c("kd_tree", "cover_tree", "CR", "brute"))
an input data matrix.
a query data matrix.
nearest neighbor searching algorithm.
the maximum number of nearest neighbors to search. The default value is set to 10.
a list contains:
an n x k matrix for the nearest neighbor indice.
an n x k matrix for the nearest neighbor Euclidean distances.
The cover tree is O(n) space data structure which allows us to answer queries in the same O(log(n)) time as kd tree given a fixed intrinsic dimensionality. Templated code from http://hunch.net/~jl/projects/cover_tree/cover_tree.html is used.
The kd tree algorithm is implemented in the Approximate Near Neighbor (ANN) C++ library (see http://www.cs.umd.edu/~mount/ANN/). The exact nearest neighbors are searched in this package.
The CR algorithm is the VR using distance 1-x'y assuming x
and y
are unit vectors.
The brute algorithm searches linearly. It is a naive method.
Bentley J.L. (1975), “Multidimensional binary search trees used for associative search,” Communication ACM, 18, 309-517.
Arya S. and Mount D.M. (1993), “Approximate nearest neighbor searching,” Proc. 4th Ann. ACM-SIAM Symposium on Discrete Algorithms (SODA'93), 271-280.
Arya S., Mount D.M., Netanyahu N.S., Silverman R. and Wu A.Y. (1998), “An optimal algorithm for approximate nearest neighbor searching,” Journal of the ACM, 45, 891-923.
Beygelzimer A., Kakade S. and Langford J. (2006), “Cover trees for nearest neighbor,” ACM Proc. 23rd international conference on Machine learning, 148, 97-104.
nn2
in RANN, ann
in yaImpute and knn
in class.
# NOT RUN {
data<- query<- cbind(1:10, 1:10)
get.knn(data, k=5)
get.knnx(data, query, k=5)
get.knnx(data, query, k=5, algo="kd_tree")
# }
Run the code above in your browser using DataCamp Workspace