dbscan: DBSCAN

Description

Fast reimplementation of DBSCAN using a kd-tree.

Usage

dbscan(x, eps, minPts = 5, bucketSize = 10, splitRule = "suggest", approx = 0)

Arguments

Value

A integer vector with cluster assignments. Zero indicates noise points.

References

Martin Ester, Hans-Peter Kriegel, Joerg Sander, Xiaowei Xu (1996). A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. Institute for Computer Science, University of Munich. Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining (KDD-96).

Examples

Run this code

data(iris)
iris <- as.matrix(iris[,1:4])

res <- dbscan(iris, .4, 4)
pairs(iris, col=res+1)

## compare with dbscan from package fpc
res2 <- fpc::dbscan(iris, .4, 4)
res2 <- res2$cluster
pairs(iris, col=res2+1)

## make sure both version produce the same results
all(res == res2)

## find suitable eps parameter (look at knee)
kNNdistplot(iris, k=4)


## example data from fpc
set.seed(665544)
n <- 600
x <- cbind(runif(10, 0, 10)+rnorm(n, sd=0.2), runif(10, 0, 10) + rnorm(n,
  sd=0.2))

res <- dbscan::dbscan(x, .2, 4)
plot(x, col=res+1)

## compare speed against fpc version
t_dbscan <- microbenchmark::microbenchmark(
  dbscan::dbscan(x, .2, 4), times = 10, unit="ms")
t_fpc <- microbenchmark::microbenchmark(
  fpc::dbscan(x, .2, 4), times = 10, unit="ms")
boxplot(rbind(t_dbscan, t_fpc))

## speedup
median(t_fpc$time)/median(t_dbscan$time)

Run the code above in your browser using DataLab