dbscan - Density Based Clustering of Applications with Noise (DBSCAN) and Related Algorithms - R package
This R package provides a fast C++ reimplementation of several density-based algorithms of the DBSCAN family for spatial data. Includes the DBSCAN (density-based spatial clustering of applications with noise) and OPTICS/OPTICSXi (ordering points to identify the clustering structure) clustering algorithms and the LOF (local outlier factor) algorithm. The implementations uses the kd-tree data structure (from library ANN) for faster k-nearest neighbor search. An R interface to fast kNN and fixed-radius NN search is also provided.
This implementation is typically faster than the native R implementation in package fpc
, or the
implementations in WEKA, ELKI and Python's scikit-learn.
Installation
Stable CRAN version: install from within R with
install.packages("dbscan")
Current development version: Download package from AppVeyor or install from GitHub (needs devtools).
install_git("mhahsler/dbscan")
Usage
Load the package and use the numeric variables in the iris dataset
library("dbscan")
data("iris")
x <- as.matrix(iris[, 1:4])
Run DBSCAN
db <- dbscan(x, eps = .4, minPts = 4)
db
DBSCAN clustering for 150 objects.
Parameters: eps = 0.4, minPts = 4
The clustering contains 4 cluster(s) and 25 noise points.
0 1 2 3 4
25 47 38 36 4
Available fields: cluster, eps, minPts
Visualize results (noise is shown in black)
pairs(x, col = db$cluster + 1L)
Calculate LOF (local outlier factor) and visualize (larger bubbles in the visualization have a larger LOF)
lof <- lof(x, k = 4)
pairs(x, cex = lof)
Run OPTICS
opt <- optics(x, eps = 1, minPts = 4)
opt
OPTICS clustering for 150 objects.
Parameters: minPts = 4, eps = 1, eps_cl = NA, xi = NA
Available fields: order, reachdist, coredist, predecessor, minPts, eps, eps_cl, xi
Extract DBSCAN-like clustering from OPTICS and create a reachability plot (extracted DBSCAN clusters at eps_cl=.4 are colored)
opt <- extractDBSCAN(opt, eps_cl = .4)
plot(opt)
Extract a hierarchical clustering using the Xi method (captures clusters of varying density)
opt <- extractXi(opt, xi = .05)
opt
plot(opt)
License
The dbscan package is licensed under the GNU General Public License (GPL) Version 3. The OPTICSXi R implementation was directly ported from the ELKI framework's Java implementation (GNU AGPLv3), with explicit permission granted by the original author, Erich Schubert.
Further Information
- Development version of dbscan on github.
- List of changes from NEWS.md
- dbscan reference manual
Maintainer: Michael Hahsler