SFclust.permute: Perform Permutation-Based Clustering Evaluation for SFclust

Description

Performs a permutation-based analysis to evaluate clustering results across different values of the \(\ell_1\) norm constraint (s). This function is designed to help determine the most appropriate \(\ell_1\) norm value by comparing the observed clustering outcome with those obtained under random permutations.

The function computes gap statistics for each \(\ell_1\) norm constraint value based on permuted versions of the input distance array, and identifies the optimal s as the one maximizing the gap statistic. Two ggplot objects are returned to visualize the gap patterns.

Usage

SFclust.permute(dist.ary, k, nperms, l1b)

Value

A list containing the following components:

totss: A numeric vector of total within-cluster sum of squared distances for each \(\ell_1\) norm value.
permtotss: A matrix of total sum of squared distances for each permutation and each \(\ell_1\) norm value.
nnonzerowss: A numeric vector of the number of nonzero weights for each \(\ell_1\) norm value.
gaps: A numeric vector of gap statistics: the difference between observed and permuted clustering results.
sdgaps: A numeric vector of standard deviations of the gaps across permutations.
l1bounds: A vector of \(\ell_1\) norm constraint values that were successfully processed without error.
bestl1b: The \(\ell_1\) norm constraint value that yielded the largest gap.
failed_j: Indices of l1b values that caused errors during the clustering process.
failed_l1b: The actual \(\ell_1\) norm values that caused errors.
gapplot.l1b: A ggplot object showing the gap statistics plotted against \(\ell_1\) norm constraint values.
gapplot.nnz: A ggplot object showing the gap statistics plotted against the number of nonzero weights.

Arguments

dist.ary: A 3-dimensional distance array representing pairwise distances between trajectories across multiple variables. Follows the same format used in SFclust.
k: An integer specifying the number of clusters.
nperms: An integer specifying the number of permutations to perform.
l1b: A numeric vector of \(\ell_1\) norm constraint values to test during clustering. These values control the sparsity of the weights during clustering.

Details

This function helps assess the robustness of clustering structure and select an optimal level of sparsity. If any clustering attempt fails (e.g., due to convergence issues or weight update errors), the corresponding l1b values are reported in failed_l1b and failed_j. This function returns two ggplot objects (gapplot.l1b and gapplot.nnz) that can be used to visualize the gap statistics. These are not automatically printed, allowing users to decide when and how to display them. This function involves random sampling internally. For reproducible results, set the random seed before calling the function using set.seed().