agglomerative_clustering: Agglomerative Hierarchical Clustering

Description

Perform a hierarchical agglomerative cluster analysis on a set of observations

Usage

agglomerative_clustering(
  data,
  proximity = "single",
  distance_method = "euclidean",
  learn = FALSE,
  waiting = TRUE,
  ...
)

Value

An stats::hclust() object which describes the tree produced by the clustering process.

Arguments

data

a set of observations, presented as a matrix-like object where every row is a new observation.

proximity

the proximity definition to be used. This should be one of "single" (minimum/single linkage), "complete" (maximum/ complete linkage), "average" (average linkage).

distance_method

the distance measure to use. Supported values are:

'euclidean': Standard Euclidean distance
'manhattan': Manhattan (city-block) distance
'canberra': Canberra distance
'chebyshev': Chebyshev (maximum) distance

learn

a Boolean determining whether intermediate logs explaining how the algorithm works should be printed or not.

waiting

a Boolean determining whether the intermediate logs should be printed in chunks waiting for user input before printing the next or not.

...

additional arguments passed to proxy::dist().

Author

Eduardo Ruiz Sabajanes, eduardo.ruizs@edu.uah.es

Details

This function performs a hierarchical cluster analysis for the \(n\) objects being clustered. The definition of a set of clusters using this method follows a \(n\) step process, which repeats until a single cluster remains:

Initially, each object is assigned to its own cluster. The matrix of distances between clusters is computed.
The two clusters with closest proximity will be joined together and the proximity matrix updated. This is done according to the specified proximity. This step is repeated until a single cluster remains.

The definitions of proximity considered by this function are:

single: \(\min\left\{d(x,y):x\in A,y\in B\right\}\). Defines the proximity between two clusters as the distance between the closest objects among the two clusters. It produces clusters where each object is closest to at least one other object in the same cluster. It is known as SLINK, single-link and minimum-link.
complete: \(\max\left\{d(x,y):x\in A,y\in B\right\}\). Defines the proximity between two clusters as the distance between the furthest objects among the two clusters. It is known as CLINK, complete-link and maximum-link.
average: \(\frac{1}{\left|A\right|\cdot\left|B\right|} \sum_{x\in A}\sum_{y\in B} d(x,y)\). Defines the proximity between two clusters as the average distance between every pair of objects, one from each cluster. It is also known as UPGMA or average-link.

Examples

Run this code


cl <- agglomerative_clustering(
  db5[1:6, ],
  'single',
  learn = TRUE,
  waiting = FALSE
)

Run the code above in your browser using DataLab