dbscan v1.1-1

0

Monthly downloads

0th

Percentile

Density Based Clustering of Applications with Noise (DBSCAN) and Related Algorithms

A fast reimplementation of several density-based algorithms of the DBSCAN family for spatial data. Includes the DBSCAN (density-based spatial clustering of applications with noise) and OPTICS (ordering points to identify the clustering structure) clustering algorithms HDBSCAN (hierarchical DBSCAN) and the LOF (local outlier factor) algorithm. The implementations uses the kd-tree data structure (from library ANN) for faster k-nearest neighbor search. An R interface to fast kNN and fixed-radius NN search is also provided.

Readme

dbscan - Density Based Clustering of Applications with Noise (DBSCAN) and Related Algorithms - R package

CRAN version CRAN RStudio mirror downloads Travis-CI Build Status AppVeyor Build Status

This R package provides a fast C++ reimplementation of several density-based algorithms of the DBSCAN family for spatial data. The package includes:

  • DBSCAN: Density-based spatial clustering of applications with noise.
  • OPTICS/OPTICSXi: Ordering points to identify the clustering structure clustering algorithms.
  • HDBSCAN: Hierarchical DBSCAN with simplified hierarchy extraction.
  • LOF: Local outlier factor algorithm.
  • GLOSH: Global-Local Outlier Score from Hierarchies algorithm.

The implementations uses the kd-tree data structure (from library ANN) for faster k-nearest neighbor search. An R interface to fast kNN and fixed-radius NN search is provided along with Jarvis-Patrick clustering and Shared Nearest Neighbor Clustering. Additionally, a fast implementation of the Framework for Optimal Selection of Clusters (FOSC) is available that supports unsupervised and semisupervised clustering of hierarchical cluster tree ('hclust' object). Supports any arbitrary linkage criterion.

The implementations are typically faster than the native R implementations (e.g., dbscan in package fpc), or the implementations in WEKA, ELKI and Python's scikit-learn.

Installation

Stable CRAN version: install from within R with

install.packages("dbscan")

Current development version: Download package from AppVeyor or install from GitHub (needs devtools).

install_git("mhahsler/dbscan")

Usage

Load the package and use the numeric variables in the iris dataset

library("dbscan")

data("iris")
x <- as.matrix(iris[, 1:4])

Run DBSCAN

db <- dbscan(x, eps = .4, minPts = 4)
db
DBSCAN clustering for 150 objects.
Parameters: eps = 0.4, minPts = 4
The clustering contains 4 cluster(s) and 25 noise points.

 0  1  2  3  4 
25 47 38 36  4 

Available fields: cluster, eps, minPts

Visualize results (noise is shown in black)

pairs(x, col = db$cluster + 1L)

Calculate LOF (local outlier factor) and visualize (larger bubbles in the visualization have a larger LOF)

lof <- lof(x, k = 4)
pairs(x, cex = lof)

Run OPTICS

opt <- optics(x, eps = 1, minPts = 4)
opt
OPTICS clustering for 150 objects.
Parameters: minPts = 4, eps = 1, eps_cl = NA, xi = NA
Available fields: order, reachdist, coredist, predecessor, minPts, eps, eps_cl, xi

Extract DBSCAN-like clustering from OPTICS and create a reachability plot (extracted DBSCAN clusters at eps_cl=.4 are colored)

opt <- extractDBSCAN(opt, eps_cl = .4)
plot(opt)

Extract a hierarchical clustering using the Xi method (captures clusters of varying density)

opt <- extractXi(opt, xi = .05)
opt
plot(opt)

Run HDBSCAN (captures stable clusters)

hdb <- hdbscan(x, minPts = 4)
hdb
HDBSCAN clustering for 150 objects.
Parameters: minPts = 4
The clustering contains 2 cluster(s) and 0 noise points.

  1   2 
100  50 

Available fields: cluster, minPts, cluster_scores, membership_prob, outlier_scores, hc

Visualize the results as a simplified tree

plot(hdb, show_flat = T)

See how well each point corresponds to the clusters found by the model used

  colors <- mapply(function(col, i) adjustcolor(col, alpha.f = hdb$membership_prob[i]), 
                   palette()[hdb$cluster+1], seq_along(hdb$cluster))
  plot(x, col=colors, pch=20)

License

The dbscan package is licensed under the GNU General Public License (GPL) Version 3. The OPTICSXi R implementation was directly ported from the ELKI framework's Java implementation (GNU AGPLv3), with explicit permission granted by the original author, Erich Schubert.

Further Information

Maintainer: Michael Hahsler

Functions in dbscan

Name Description
kNN Find the k Nearest Neighbors
hdbscan HDBSCAN
pointdensity Calculate Local Density at Each Data Point
jpclust Jarvis-Patrick Clustering
lof Local Outlier Factor Score
kNNdist Calculate and plot the k-Nearest Neighbor Distance
sNN Shared Nearest Neighbors
sNNclust Shared Nearest Neighbor Clustering
DS3 DS3: Spatial data with arbitrary shapes
NN Nearest Neighbors Auxiliary Functions
dbscan DBSCAN
extractFOSC Framework for Optimal Selection of Clusters
moons Moons Data
glosh Global-Local Outlier Score from Hierarchies
optics OPTICS
reachability Density Reachability Structures
frNN Find the Fixed Radius Nearest Neighbors
hullplot Plot Convex Hulls of Clusters
No Results!

Vignettes of dbscan

Name
figures/dbscan_a.pdf
figures/dbscan_b.pdf
figures/dbscan_benchmark.pdf
figures/optics_benchmark.pdf
dbscan.Rnw
dbscan.bib
hdbscan.Rmd
No Results!

Last month downloads

Details

Date 2017-03-19
LinkingTo Rcpp
VignetteBuilder knitr
BugReports https://github.com/mhahsler/dbscan
License GPL (>= 2)
Copyright ANN library is copyright by University of Maryland, Sunil Arya and David Mount. All other code is copyright by Michael Hahsler and Matthew Piekenbrock.
SystemRequirements C++11
NeedsCompilation yes
Packaged 2017-03-19 17:57:07 UTC; hahsler
Repository CRAN
Date/Publication 2017-03-19 23:26:00 UTC

Include our badge in your README

[![Rdoc](http://www.rdocumentation.org/badges/version/dbscan)](http://www.rdocumentation.org/packages/dbscan)