Learn R Programming

CrossClustering

CrossClustering is a partial clustering algorithm that combines the Ward’s minimum variance and Complete Linkage algorithms, providing automatic estimation of a suitable number of clusters and identification of outlier elements.

Example

This is a basic example which shows you how to the main function, i.e.  cc_crossclustering() works:

## basic example code
library(CrossClustering)

#### method = "complete"
data(toy)

### toy is transposed as we want to cluster samples (columns of the original
### matrix)
d <- dist(t(toy), method = "euclidean")

### Run CrossClustering
toyres <- cc_crossclustering(
  d, k_w_min = 2, k_w_max = 5, k2_max = 6, out = TRUE
)
toyres
#> 
#>     CrossClustering with method complete.
#> 
#> Parameter used:
#>   - Interval for the number of cluster of Ward's algorithm: [2, 5].
#>   - Interval for the number of cluster of the complete algorithm: [2, 6].
#>   - Outliers are considered.
#> 
#> Number of clusters found: 3.
#> Leading to an avarage silhouette width of: 0.8405.
#> 
#> A total of 6 elements clustered out of 7 elements considered.

Another useful function worth to mention is ari:

clusters <- iris[-5] |>
 dist() |>
 hclust(method = 'ward.D') |>
 cutree(k = 3)

ground_truth <- iris[[5]] |>
  as.numeric()

table(ground_truth, clusters) |> 
  ari()
#>     Adjusted Rand Index (alpha = 0.05)
#> 
#> ARI                  = 0.76 (moderate recovery)
#> Confidence interval  = [0.74, 0.78]
#> 
#> p-values:
#>   * Qannari test     = < 0.001
#>   * Permutation test =   0.001

Install

CRAN version

CrossClustering package is on CRAN, use the standard method to install it. install_packages('CrossClustering')

develop version

To install the develop branch of CrossClastering package, use:

# install.packages(devtools)
devtools::install_github('CorradoLanera/CrossClustering', ref = 'develop')

Bug reports

If you encounter a bug, please file a reprex (minimal reproducible example) to https://github.com/CorradoLanera/CrossClustering/issues

References

Tellaroli P, Bazzi M., Donato M., Brazzale A. R., Draghici S. (2016). Cross-Clustering: A Partial Clustering Algorithm with Automatic Estimation of the Number of Clusters. PLoS ONE 11(3): e0152333. https://doi.org/10.1371/journal.pone.0152333

Tellaroli P, Bazzi M., Donato M., Brazzale A. R., Draghici S. (2017). E1829: Cross-Clustering: A Partial Clustering Algorithm with Automatic Estimation of the Number of Clusters. CMStatistics 2017, London 16-18 December, Book of Abstracts (ISBN 978-9963-2227-4-2)

Copy Link

Version

Install

install.packages('CrossClustering')

Monthly Downloads

135

Version

4.1.2

License

GPL-3

Maintainer

Paola Tellaroli

Last Published

May 14th, 2024

Functions in CrossClustering (4.1.2)

cc_test_ari_permutation

A permutation test for testing the null hypothesis of random agreement (i.e., adjusted Rand Index equal to 0) between two partitions.
cc_crossclustering

A partial clustering algorithm with automatic estimation of the number of clusters and identification of outliers
chain_effect

A toy dataset for illustrating the chain effect.
prune_zero_tail

Prune tail made of zeros
cc_test_ari

A test for testing the null hypothesis of random agreement (i.e., adjusted Rand Index equal to 0) between two partitions.
is_zero

Check for zero
cc_get_cluster

Provides the vector of clusters' ID to which each element belong to.
ari

Computes the adjusted Rand index and the confidence interval, comparing two classifications from a contingency table.
consensus_cluster

Get clusters which reach max consensus
toy

A toy example matrix
nb_data

RNA-Seq dataset example
CrossClustering-package

CrossClustering: A Partial Clustering Algorithm
worms

A famous shape data set containing two clusters with two worms shapes and outliers
twomoons

A famous shape data set containing two clusters with two moons shapes and outliers
reverse_table

Reverse the process of create a contingency table