utils_cluster_hclust_optimizer: Optimize the Silhouette Width of Hierarchical Clustering Solutions

Description

Performs a parallelized grid search to find the number of clusters maximizing the overall silhouette width of the clustering solution (see utils_cluster_silhouette()). When method = NULL, the optimization also includes all methods available in stats::hclust() in the grid search. This function supports parallelization via future::plan() and a progress bar generated by the progressr package (see Examples).

Usage

utils_cluster_hclust_optimizer(d = NULL, method = NULL)

Value

data frame

Arguments

d

(required, matrix) distance matrix typically resulting from distantia_matrix(), but any other square matrix should work. Default: NULL

method

(optional, character string) Argument of stats::hclust() defining the agglomerative method. One of: "ward.D", "ward.D2", "single", "complete", "average" (= UPGMA), "mcquitty" (= WPGMA), "median" (= WPGMC) or "centroid" (= UPGMC). Unambiguous abbreviations are accepted as well.

This function supports a parallelization setup via future::plan(), and progress bars provided by the package progressr.

Examples

Run this code


#weekly covid prevalence
#in 10 California counties
#aggregated by month
tsl <- tsl_initialize(
  x = covid_prevalence,
  name_column = "name",
  time_column = "time"
) |>
  tsl_subset(
    names = 1:10
  ) |>
  tsl_aggregate(
    new_time = "months",
    fun = max
  )

if(interactive()){
  #plotting first three time series
  tsl_plot(
    tsl = tsl_subset(
      tsl = tsl,
      names = 1:3
    ),
    guide_columns = 3
  )
}

#compute dissimilarity matrix
psi_matrix <- distantia(
  tsl = tsl,
  lock_step = TRUE
) |>
  distantia_matrix()

#optimize hierarchical clustering
hclust_optimization <- utils_cluster_hclust_optimizer(
  d = psi_matrix
)

#best solution in first row
head(hclust_optimization)

Run the code above in your browser using DataLab

Description

Usage

Value

Arguments

See Also

Examples