Learn R Programming

WeightedCluster (version 2.0)

rarcat: Robustness Assessment of Regressions using Cluster Analysis Typologies (RARCAT)

Description

rarcat is a wrapper for the functions regressboot and bootpool that performs the entire RARCAT procedure on all possible associations between a typology and covariates of interest. See Roth et al. (2024) or the R tutorial as WeightedCluster vignette for all details on the corresponding methods and their utility.

Usage

rarcat(formula, data, diss, 
        robust=TRUE, R=500, 
        kmedoid=FALSE, hclust.method="ward.D", 
        fixed=FALSE, ncluster=10, cqi="HC",
        parallel=FALSE, progressbar=FALSE,
        fisher.transform=FALSE, 
		lmerCtrl=lme4::lmerControl())
# S3 method for rarcat
plot(x, what="AME", covar=x$factorName[1], 
		pooled.ame=TRUE, naive.ame=TRUE,  
		with.legend=TRUE, legend.prop=NA, rows=NA, 
		cols=NA, main=NULL, 
		xlab=paste(covar, "Average Marginal Effect"),
		xlim=NULL, conf.level=0.95,...)
# S3 method for rarcat
print(x, conf.level=0.95, single.row = FALSE, digits = 3, ...)
# S3 method for rarcat
summary(object, ...)

Value

The output of rarcattables contains the following tables:

The output of bootpool is a list with the following components:

nobs

An integer with the number of observations (i.e., number of estimated AMES from the function regressboot) used to compute the robust estimates in the multilevel model. Due to missing observations when an individual does not appear in a bootstrap, nobs < m x B, where m < M is the number of individuals in a given cluster, M is the total number of individuals and B is the total number of bootstrap in regressboot.

pooled.ame

A numeric value indicating the pooled AME, which is the mean change in cluster membership probability for a change in the level of the covariate of interest over all bootstraps and all individuals belonging to the reference cluster in the original typology.

standard.error

Standard error of the pooled AME, which diminishes asymptotically as the number of bootstrap increases.

bootstrap.stddev

The estimate for the standard deviation of the bootstrap random effect. This can be used to construct a prediction interval for the association of interest (see Roth et al. 2024 for details on how to compute this).

observation.stddev

The estimate for the standard deviation of the bootstrap random effect.

bootstrap.ranef

A vector of size B containing the estimated random effects for each bootstrap.

observation.ranef

A vector of size m containing the estimated random effects for each observation in the reference cluster.

original.analysis

Average Marginal Effects (AMEs) estimated with multivariable logistic regressions and representing the expected change in the probability of belonging to a trajectory group (a reference cluster) for a change in the level of a variable (a covariate of interest), together with their confidence intervals.

robust.analysis

Pooled AMEs from the bootstrap procedure and their prediction intervals, representing the range of expected values if the clustering and associated regressions were performed on a new sample from the same underlying distribution. This table provide robust estimates for a typology-based association study.

Arguments

formula

A formula object with the clustering solution on the left side and the covariates of interest on the ride side.

data

The dataset (data frame) with column names corresponding to the information in formula. The number of individuals (row number) should match the dimension of diss.

diss

The numerical dissimilarity matrix used for clustering. Only a pre-computed matrix (i.e., where pairwise dissimilarities do not depend on the resample) is currently supported.

robust

Logical. TRUE (the default) indicates that RARCAT should be performed. FALSE implies a much faster function run but only output the original analysis, which is a standard regression analysis for all combinations of reference clusters and covariates.

R

The integer number of bootstrap. Set to 500 by default to attain a satisfactory precision around the estimates as the procedure involves multiple steps.

kmedoid

The clustering algorithm as a character string. Currently only "pam" (calling the function wcKMedRange) and "hierarchical" (calling the function fastcluster::hclust) are supported. By default "pam".

hclust.method

A character string with the method argument of hclust, "ward.D" by default.

fixed

Logical. TRUE implies that the number of clusters is the same in every bootstrap. FALSE (default) implies that an optimal number of clusters is evaluated each time.

ncluster

Integer. Either the number of clusters in every bootstrap if fixed is TRUE or the maximum number of clusters (starting from 2) to be evaluated in each bootstrap if fixed is FALSE.

cqi

A character string with the cluster quality index to be evaluated for each new partition. Any column of as.clustrange is supported, "CH" (the Calinski-Harabasz index) by default. Also works with algo= "pam".

parallel

Logical. Whether to initialize the parallel processing of the future package using the default multisession strategy. If FALSE (default), then the current plan is used. If TRUE, multisession plan is initialized using default values.

progressbar

Logical. Whether to initialize a progressbar using the future package. If FALSE (default), then the current progress bar handlers is used . If TRUE, a new global progress bar handlers is initialized.

fisher.transform

Logical. TRUE means that a Fisher transformation is applied in the multilevel model estimation step. This can be recommended in case of extreme associations (close to the -1 or 1 boundaries). FALSE by default.

lmerCtrl

Control parameter for lme4 (see lmerControl

x

rarcat object to be printed or plotted.

object

rarcat object for summary (diagnostic tools).

conf.level

Confidence level for the confidence intervals. 0.95 by default.

digits

Number of significant digits to print (3 by default).

single.row

Logical. Whether to show confidence interval on the same or separate line (Default=FALSE).

what

Character. Information to plot. With "AME" (default), the boostrapped AME are shown. Set to "ranef" to view the distribution of observation-level random effect (usefull to identify potentially influential unstable observation).

covar

Character. The covariate of interest.

pooled.ame

Logical. Whether to add a vertical line and confidence interval for the pooled AME.

naive.ame

Logical. Whether to add a vertical line and confidence interval for the naive AME.

with.legend

Logical. If FALSE, the legend is not plotted.

legend.prop

Real in range [0,1]. Proportion of the graphic area devoted to the legend plot with.legend=TRUE. Default value is set according to the place (bottom or right of the graphic area) where the legend is plotted.

rows

Integers. Number of rows of the plot panel.

cols

Integers. Number of columns of the plot panel.

main

Character string. Title of the graphic.

xlab

x axis label.

xlim

Numerics. Limits of the x-axis.

...

Additionnal parameters passed to/from methods.

Author

Leonard Roth

Details

The rarcat function runs a standard typology-based association study and evaluates the impact of sampling uncertainty on the results, thus assessing the reproducibility of the analysis.

References

Roth, L., Studer, M., Zuercher, E., & Peytremann-Bridevaux, I. (2024). Robustness assessment of regressions using cluster analysis typologies: a bootstrap procedure with application in state sequence analysis. BMC medical research methodology, 24(1), 303. https://doi.org/10.1186/s12874-024-02435-8.

Examples

Run this code
## Loading the data (TraMineR package)
data(mvad)

## Reducing sample size to speed up computations
mvad <- mvad[1:200,]


## Creating the state sequence object
mvad.seq <- seqdef(mvad[, 17:86])

## Distance computation
diss <- seqdist(mvad.seq, method="LCS")

## Hierarchical clustering
hc <- fastcluster::hclust(as.dist(diss), method="ward.D")

## Computing cluster quality measures
clustqual <- as.clustrange(hc, diss=diss, ncluster=6)

## A six clusters solution is chosen here
mvad$clustering <- clustqual$clustering$cluster2

## The formula should include the typology (dependent) and the covariates of interest
## As in the original analysis, hierarchical clustering with Ward method is implemented
## The number of clusters is fixed to 2 here, larger values should often be used.
## For illustration purposes, the number of bootstrap is smaller than what it ought to be
rarcatout <- rarcat(clustering ~ Grammar + gcse5eq, mvad, diss, R = 30, 
                    kmedoid=TRUE, fixed = TRUE, ncluster = 2)

## Assess the robustness of the original analysis
rarcatout
#plot(rarcatout, covar="gcse5eqyes")
#plot(rarcatout, covar="gcse5eqyes", what="ranef")
#summary(rarcatout)

Run the code above in your browser using DataLab