Learn R Programming

dbscan (version 1.2.5)

optics: Ordering Points to Identify the Clustering Structure (OPTICS)

Description

Implementation of the OPTICS (Ordering points to identify the clustering structure) point ordering algorithm using a kd-tree.

Usage

optics(x, eps = Inf, minPts = 5, ...)

# S3 method for optics print(x, ...)

# S3 method for optics plot(x, cluster = TRUE, predecessor = FALSE, ...)

# S3 method for optics as.reachability(object, ...)

# S3 method for optics as.dendrogram(object, ...)

extractDBSCAN(object, eps_cl)

extractXi(object, xi, minimum = FALSE, correctPredecessors = TRUE)

# S3 method for optics predict(object, newdata, data, ...)

Value

An object of class optics with components:

eps

value of eps parameter.

minPts

value of minPts parameter.

order

optics order for the data points in x.

reachdist

reachability distance for each data point in x.

coredist

core distance for each data point in x.

For extractDBSCAN(), in addition the following components are available:

eps_cl

the value of the eps_cl parameter.

cluster

assigned cluster labels in the order of the data points in x.

For extractXi(), in addition the following components are available:

xi

Steepness thresholdx.

cluster

assigned cluster labels in the order of the data points in x.

clusters_xi

data.frame containing the start and end of each cluster found in the OPTICS ordering.

Arguments

x

a data matrix or a dist object.

eps

OPTICS uses a maximum epsilon neighborhood size of Inf. The upper limit of the size can be limited to improves performance. If set too low then many reachability values will erroneously become Inf shown as dashed lines in the reachability plot. eps should be increased.

minPts

the parameter is used to identify dense neighborhoods and the reachability distance is calculated as the distance to the minPts nearest neighbor. Controls the smoothness of the reachability distribution. Default is 5 points.

...

additional arguments are passed on to fixed-radius nearest neighbor search algorithm. See frNN() for details on how to control the search strategy.

cluster, predecessor

plot clusters and predecessors.

object

clustering object.

eps_cl

Threshold to identify clusters (eps_cl <= eps).

xi

Steepness threshold to identify clusters hierarchically using the Xi method.

minimum

logical, representing whether or not to extract the minimal (non-overlapping) clusters in the Xi clustering algorithm.

correctPredecessors

logical, correct a common artifact by pruning the steep up area for points that have predecessors not in the cluster--found by the ELKI framework, see details below.

newdata

new data points for which the cluster membership should be predicted.

data

the data set used to create the clustering object.

Author

Michael Hahsler and Matthew Piekenbrock

Details

The Algorithm

This implementation of OPTICS implements the original algorithm as described by Ankerst et al (1999). OPTICS is an ordering algorithm with methods to extract a clustering from the ordering. While using similar concepts as DBSCAN, minPts in OPTICS has a different effect then in DBSCAN. Since it is also used to calculate the reachability distance, larger values will make the reachability distance plot smoother. The parameter eps is optional and defaults to Inf. It only represents a an upper limit for the neighborhood size used to reduce computational complexity which is helpful for large data sets.

OPTICS linearly orders the data points such that points which are spatially closest become neighbors in the ordering. The closest analog to this ordering is dendrogram in single-link hierarchical clustering. The algorithm also calculates the reachability distance for each point. plot() (see reachability_plot) produces a reachability plot which shows each points reachability distance between two consecutive points where the points are sorted by OPTICS. Valleys represent clusters (the deeper the valley, the more dense the cluster) and high points indicate points between clusters.

Specifying the Data

If x is specified as a data matrix, then Euclidean distances and fast nearest neighbor lookup using a kd-tree are used. See kNN() for details on the parameters for the kd-tree.

Extracting a Clustering

Several methods to extract a clustering from the order returned by OPTICS are implemented:

  • extractDBSCAN() extracts a clustering from an OPTICS ordering that is similar to what DBSCAN would produce with an eps set to eps_cl (see Ankerst et al, 1999). The only difference to a DBSCAN clustering is that OPTICS is not able to assign some border points and reports them instead as noise.

  • extractXi() extract clusters hierarchically specified in Ankerst et al (1999) based on the steepness of the reachability plot. One interpretation of the xi parameter is that it classifies clusters by change in relative cluster density. The used algorithm was originally contributed by the ELKI framework and is explained in Schubert et al (2018), but contains a set of fixes.

Predict Cluster Memberships

predict() requires an extracted DBSCAN clustering with extractDBSCAN() and then uses predict for dbscan().

References

Mihael Ankerst, Markus M. Breunig, Hans-Peter Kriegel, Joerg Sander (1999). OPTICS: Ordering Points To Identify the Clustering Structure. ACM SIGMOD international conference on Management of data. ACM Press. pp. tools:::Rd_expr_doi("10.1145/304181.304187")

Hahsler M, Piekenbrock M, Doran D (2019). dbscan: Fast Density-Based Clustering with R. Journal of Statistical Software, 91(1), 1-30. tools:::Rd_expr_doi("10.18637/jss.v091.i01")

Erich Schubert, Michael Gertz (2018). Improving the Cluster Structure Extracted from OPTICS Plots. In Lernen, Wissen, Daten, Analysen (LWDA 2018), pp. 318-329.

See Also

Density reachability.

Other clustering functions: dbscan(), extractFOSC(), hdbscan(), jpclust(), ncluster(), sNNclust()

Examples

Run this code
set.seed(2)
n <- 400

x <- cbind(
  x = runif(4, 0, 1) + rnorm(n, sd = 0.1),
  y = runif(4, 0, 1) + rnorm(n, sd = 0.1)
  )

plot(x, col=rep(1:4, times = 100))

### run OPTICS (Note: we use the default eps calculation)
res <- optics(x, minPts = 10)
res

### get order
res$order

### plot produces a reachability plot
plot(res)

### plot the order of points in the reachability plot
plot(x, col = "grey")
polygon(x[res$order, ])

### extract a DBSCAN clustering by cutting the reachability plot at eps_cl
res <- extractDBSCAN(res, eps_cl = .065)
res

plot(res)  ## black is noise
hullplot(x, res)

### re-cut at a higher eps threshold
res <- extractDBSCAN(res, eps_cl = .07)
res
plot(res)
hullplot(x, res)

### extract hierarchical clustering of varying density using the Xi method
res <- extractXi(res, xi = 0.01)
res

plot(res)
hullplot(x, res)

# Xi cluster structure
res$clusters_xi

### use OPTICS on a precomputed distance matrix
d <- dist(x)
res <- optics(d, minPts = 10)
plot(res)

Run the code above in your browser using DataLab