dbscan - Density Based Clustering of Applications with Noise (DBSCAN) and Related Algorithms - R package
This R package provides a fast C++ reimplementation of several density-based algorithms of the DBSCAN family for spatial data. The package includes:
- DBSCAN: Density-based spatial clustering of applications with noise.
- OPTICS/OPTICSXi: Ordering points to identify the clustering structure clustering algorithms.
- HDBSCAN: Hierarchical DBSCAN with simplified hierarchy extraction.
- LOF: Local outlier factor algorithm.
- GLOSH: Global-Local Outlier Score from Hierarchies algorithm.
The implementations uses the kd-tree data structure (from library ANN) for faster k-nearest neighbor search. An R interface to fast kNN and fixed-radius NN search is provided along with Jarvis-Patrick clustering and Shared Nearest Neighbor Clustering. Additionally, a fast implementation of the Framework for Optimal Selection of Clusters (FOSC) is available that supports unsupervised and semisupervised clustering of hierarchical cluster tree ('hclust' object). Supports any arbitrary linkage criterion.
The implementations are typically faster than the native R implementations (e.g., dbscan in package fpc), or the
implementations in WEKA, ELKI and Python's scikit-learn.
Installation
Stable CRAN version: install from within R with
install.packages("dbscan")Current development version: Download package from AppVeyor or install from GitHub (needs devtools).
install_git("mhahsler/dbscan")Usage
Load the package and use the numeric variables in the iris dataset
library("dbscan")
data("iris")
x <- as.matrix(iris[, 1:4])Run DBSCAN
db <- dbscan(x, eps = .4, minPts = 4)
dbDBSCAN clustering for 150 objects.
Parameters: eps = 0.4, minPts = 4
The clustering contains 4 cluster(s) and 25 noise points.
0 1 2 3 4
25 47 38 36 4
Available fields: cluster, eps, minPtsVisualize results (noise is shown in black)
pairs(x, col = db$cluster + 1L)Calculate LOF (local outlier factor) and visualize (larger bubbles in the visualization have a larger LOF)
lof <- lof(x, k = 4)
pairs(x, cex = lof)Run OPTICS
opt <- optics(x, eps = 1, minPts = 4)
optOPTICS clustering for 150 objects.
Parameters: minPts = 4, eps = 1, eps_cl = NA, xi = NA
Available fields: order, reachdist, coredist, predecessor, minPts, eps, eps_cl, xiExtract DBSCAN-like clustering from OPTICS and create a reachability plot (extracted DBSCAN clusters at eps_cl=.4 are colored)
opt <- extractDBSCAN(opt, eps_cl = .4)
plot(opt)Extract a hierarchical clustering using the Xi method (captures clusters of varying density)
opt <- extractXi(opt, xi = .05)
opt
plot(opt)Run HDBSCAN (captures stable clusters)
hdb <- hdbscan(x, minPts = 4)
hdbHDBSCAN clustering for 150 objects.
Parameters: minPts = 4
The clustering contains 2 cluster(s) and 0 noise points.
1 2
100 50
Available fields: cluster, minPts, cluster_scores, membership_prob, outlier_scores, hcVisualize the results as a simplified tree
plot(hdb, show_flat = T)See how well each point corresponds to the clusters found by the model used
colors <- mapply(function(col, i) adjustcolor(col, alpha.f = hdb$membership_prob[i]),
palette()[hdb$cluster+1], seq_along(hdb$cluster))
plot(x, col=colors, pch=20)License
The dbscan package is licensed under the GNU General Public License (GPL) Version 3. The OPTICSXi R implementation was directly ported from the ELKI framework's Java implementation (GNU AGPLv3), with explicit permission granted by the original author, Erich Schubert.
Further Information
- Development version of dbscan on github.
- List of changes from NEWS.md
- dbscan reference manual
Maintainer: Michael Hahsler