This function performs semi-hierarchical clustering on the basis of dissimilarity with the OPTICS algorithm (Ordering Points To Identify the Clustering Structure)
hclu_optics(
dissimilarity,
index = names(dissimilarity)[3],
minPts = NULL,
eps = NULL,
xi = 0.05,
minimum = FALSE,
show_hierarchy = FALSE,
algorithm_in_output = TRUE,
...
)
A list
of class bioregion.clusters
with five slots:
name: character
containing the name of the algorithm
args: list
of input arguments as provided by the user
inputs: list
of characteristics of the clustering process
algorithm: list
of all objects associated with the
clustering procedure, such as original cluster objects
clusters: data.frame
containing the clustering results
In the algorithm
slot, if algorithm_in_output = TRUE
, users can
find the output of optics.
the output object from dissimilarity()
or
similarity_to_dissimilarity()
, or a dist
object.
If a data.frame
is used, the first two columns represent pairs of
sites (or any pair of nodes), and the next column(s) are the dissimilarity
indices.
name or number of the dissimilarity column to use. By default,
the third column name of dissimilarity
is used.
a numeric
value specifying the minPts argument of
dbscan). minPts is the minimum number of
points to form a dense region. By default, it is set to the natural
logarithm of the number of sites in dissimilarity
.
a numeric
value specifying the eps argument of
optics). It is the upper limit of the size
of the epsilon neighborhood. Limiting the neighborhood size improves
performance and has no or very little impact on the ordering as long as it
is not set too low. If not specified (default behavior), the largest
minPts-distance in the data set is used which gives the same result as
infinity.
a numeric
value specifying the steepness threshold to
identify clusters hierarchically using the Xi method
(see optics).
a boolean
specifying if the hierarchy should be pruned
out from the output to only keep clusters at the "minimal" level, i.e.
only leaf / non-overlapping clusters.
If TRUE
, then argument show_hierarchy
should be FALSE
.
a boolean
specifying if the hierarchy of
clusters should be included in the output. By default, the hierarchy is not
visible in the clusters obtained from OPTICS - it can only be visualized by
visualising the plot of the OPTICS object. If show_hierarchy = TRUE
,
then the output cluster data.frame
will contain additional columns
showing the hierarchy of clusters.
a boolean
indicating if the original output
of dbscan should be returned in the output (TRUE
by
default, see Value).
you can add here further arguments to be passed to optics()
(see optics).
Boris Leroy (leroy.boris@gmail.com), Pierre Denelle (pierre.denelle@gmail.com) and Maxime Lenormand (maxime.lenormand@inrae.fr)
The OPTICS (Ordering points to identify the clustering structure) is a
semi-hierarchical clustering algorithm which orders the points in the
dataset such that points which are closest become neighbors, and calculates
a reachability distance for each point. Then, clusters can be extracted in a
hierarchical manner from this reachability distance, by identifying clusters
depending on changes in the relative cluster density. The reachability plot
should be explored to understand the clusters and their hierarchical nature,
by running plot on the output of the function
if algorithm_in_output = TRUE
: plot(object$algorithm)
.
We recommend reading Hahsler2019bioregion to grasp the
algorithm, how it works, and what the clusters mean.
To extract the clusters, we use the extractXi function which is based on the steepness of the reachability plot (see optics)
Hahsler2019bioregion
nhclu_dbscan
dissim <- dissimilarity(fishmat, metric = "all")
clust1 <- hclu_optics(dissim, index = "Simpson")
clust1
# Visualize the optics plot (the hierarchy of clusters is illustrated at the
# bottom)
plot(clust1$algorithm)
# Extract the hierarchy of clusters
clust1 <- hclu_optics(dissim, index = "Simpson", show_hierarchy = TRUE)
clust1
Run the code above in your browser using DataLab