Generates a density based clustering of arbitrary shape as introduced in Ester et al. (1996).

```
dbscan(data, eps, MinPts = 5, scale = FALSE, method = c("hybrid", "raw",
"dist"), seeds = TRUE, showplot = FALSE, countmode = NULL)
# S3 method for dbscan
print(x, ...)
# S3 method for dbscan
plot(x, data, ...)
# S3 method for dbscan
predict(object, data, newdata = NULL,
predict.max=1000, ...)
```

data

data matrix, data.frame, dissimilarity matrix or
`dist`

-object. Specify `method="dist"`

if the data should
be interpreted as dissimilarity matrix or object. Otherwise
Euclidean distances will be used.

eps

Reachability distance, see Ester et al. (1996).

MinPts

Reachability minimum no. of points, see Ester et al. (1996).

scale

scale the data if `TRUE`

.

method

"dist" treats data as distance matrix (relatively fast but memory expensive), "raw" treats data as raw data and avoids calculating a distance matrix (saves memory but may be slow), "hybrid" expects also raw data, but calculates partial distance matrices (very fast with moderate memory requirements).

seeds

FALSE to not include the `isseed`

-vector in the
`dbscan`

-object.

showplot

0 = no plot, 1 = plot per iteration, 2 = plot per subiteration.

countmode

NULL or vector of point numbers at which to report progress.

x

object of class `dbscan`

.

object

object of class `dbscan`

.

newdata

matrix or data.frame with raw data to predict.

predict.max

max. batch size for predictions.

...

Further arguments transferred to plot methods.

`predict.dbscan`

gives out a vector of predicted clusters for the
points in `newdata`

.

`dbscan`

gives out
an object of class 'dbscan' which is a LIST with components

integer vector coding cluster membership with noise observations (singletons) coded as 0

logical vector indicating whether a point is a seed (not border, not noise)

parameter eps

parameter MinPts

Clusters require a minimum no of points (MinPts) within a maximum distance (eps) around one of its members (the seed). Any point within eps around any point which satisfies the seed condition is a cluster member (recursively). Some points may not belong to any clusters (noise).

We have clustered a 100.000 x 2 dataset in 40 minutes on a Pentium M 1600 MHz.

`print.dbscan`

shows a statistic of the number of points
belonging to the clusters that are seeds and border points.

`plot.dbscan`

distinguishes between seed and border points by
plot symbol.

Martin Ester, Hans-Peter Kriegel, Joerg Sander, Xiaowei Xu (1996). A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. Institute for Computer Science, University of Munich. Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining (KDD-96).

# NOT RUN { set.seed(665544) n <- 600 x <- cbind(runif(10, 0, 10)+rnorm(n, sd=0.2), runif(10, 0, 10)+rnorm(n, sd=0.2)) par(bg="grey40") ds <- dbscan(x, 0.2) # run with showplot=1 to see how dbscan works. ds plot(ds, x) x2 <- matrix(0,nrow=4,ncol=2) x2[1,] <- c(5,2) x2[2,] <- c(8,3) x2[3,] <- c(4,4) x2[4,] <- c(9,9) predict(ds, x, x2) n <- 600 x <- cbind((1:3)+rnorm(n, sd=0.2), (1:3)+rnorm(n, sd=0.2)) # Not run, but results from my machine are 0.105 - 0.068 - 0.255: # system.time(ds <- dbscan(x, 0.3, countmode=NULL, method="raw"))[3] # system.time(dsb <- dbscan(x, 0.3, countmode=NULL, method="hybrid"))[3] # system.time(dsc <- dbscan(dist(x), 0.3, countmode=NULL, # method="dist"))[3] # }