Learn R Programming

MSCA (version 1.2.1)

fast_clara_jaccard: Fast CLARA-like clustering using Jaccard dissimilarity

Description

Implements a CLARA (Clustering Large Applications) strategy using Jaccard dissimilarity computed on individual patients state matrices. The algorithm repeatedly samples subsets of the data, performs PAM clustering on each subset, and selects the medoids that minimise the total dissimilarity across the full dataset. Final assignments are made by mapping all data points to the nearest selected medoid.

Usage

fast_clara_jaccard(
  data,
  k,
  samples = 20,
  samplesize = NULL,
  seed = 123,
  frac = 1
)

Value

A list with index of patients from the sample a, medoid indices, cluster assignment, and cost.

clustering

An integer vector of cluster assignments for each patient.

medoids

Indices of medoids associated witht the lower cost.

sample

Indices of the sampled columns used in clustering.

cost

Total cost (sum of dissimilarities to assigned medoids).

Arguments

data

A state matrix of censored time-to-event indicators as computed by the make_state_matrix function.

k

Number of returned clusters.

samples

Number of random samples drawn from the analysed population.

samplesize

Number of patients per sample (default: min(50 + 5k, ncol(data))).

seed

Random seed for reproducibility (default: 123).

frac

Fraction of the population to use for cost computation (default: 1).

Details

This implementation adapts the original CLARA method described by Kaufman and Rousseeuw (1990) in "Finding Groups in Data: An Introduction to Cluster Analysis".

References

Kaufman, L. & Rousseeuw, P. J. (1990). Finding Groups in Data: An Introduction to Cluster Analysis. Wiley.