Learn R Programming

clusterGGM (version 0.1.1)

cggm_cv: Cross Validation for the Clusterpath Estimator of the Gaussian Graphical Model

Description

Perform cross validation to tune the weight matrix parameters phi and k (for k-nearest-neighbors) as well as the aggregation parameter lambda_cpath and the sparsity parameter lambda_lasso of the clusterpath estimator of the Gaussian Graphical Model (CGGM) in order to obtain a sparse estimate with variable clustering of the precision matrix or the covariance matrix. The scoring metric is the negative log-likelihood (lower is better).

Usage

cggm_cv(
  X,
  tune_grid,
  kfold = 5,
  folds = NULL,
  connected = TRUE,
  fit = TRUE,
  refit = TRUE,
  lasso_unit_weights = FALSE,
  estimate_Sigma = FALSE,
  verbose = 0,
  n_jobs = 1,
  ...
)

Value

An object of class "CGGM_CV" with the following components:

fit

A list with cross-validation results for CGGM without the refitting step. It consists of four components:

  • final (an object of class "CGGM" corresponding to the final model fit using the optimal values of the tuning parameters; see cggm())

  • scores (a data frame containing the values of the tuning parameters and the corresponding cross-validation scores)

  • opt_index (the index of the optimal aggregation parameter lambda_cpath in the final model fit)

  • opt_tune (a data frame containing the values of the tuning parameters)

refit

A list with cross-validation results for CGGM including the refitting step. It contains the same four components as above, except that final is an object of class "CGGM_refit" (see cggm_refit()).

raw_cv_results

A list of raw cross-validation results before restructuring.

best

A character string indicating whether the optimal model fit without the refitting step ("fit") or including the refitting step ("refit") has a better cross-validation score.

Arguments

X

The n times p matrix holding the data, with n observations and p variables.

tune_grid

A data frame with values of the tuning parameters. Each row is a combination of parameters that is evaluated. The columns have the names of the tuning parameters and should include k and phi. The sparsity parameter lambda_lasso and the aggregation parameter lambda are optional. If there is no column named lambda_lasso, the sparsity parameter is set to 0. If there is no column named lambda, an appropriate range for the aggregation parameter is selected for each combination of k, phi, and lambda_lasso.

kfold

The number of folds. Defaults to 5.

folds

Optional argument to manually set the folds for the cross validation procedure. If this is not NULL, it overrides the kfold argument. Defaults to NULL.

connected

Logical, indicating whether connectedness of the weight matrix should be ensured. Defaults to TRUE. See clusterpath_weights().

fit

Logical, indicating whether the cross-validation procedure should consider the result from cggm(), before refitting is applied. Defaults to TRUE. At least one of fit and refit should be TRUE.

refit

Logical, indicating whether the cross-validation procedure should also consider the refitted result from cggm(). See also cggm_refit(). Defaults to TRUE. At least one of fit and refit should be TRUE.

lasso_unit_weights

Logical, indicating whether the weights in the sparsity penalty should be all one or decreasing in the magnitude of the corresponding element of the inverse of the sample covariance matrix. Defaults to FALSE.

estimate_Sigma

Logical, indicating whether CGGM should be used to estimate the covariance matrix based on the sample precision matrix. Defaults to FALSE.

verbose

Determines the amount of information printed during the cross validation. Defaults to 0.

n_jobs

Number of parallel jobs used for cross validation. If 0 or smaller, uses the maximum available number of physical cores. Defaults to 1 (sequential).

...

Additional arguments to be passed down to cggm() and cggm_refit().

Author

Daniel J.W. Touw, modifications by Andreas Alfons

References

D.J.W. Touw, A. Alfons, P.J.F. Groenen and I. Wilms (2025) Clusterpath Gaussian Graphical Modeling. arXiv:2407.00644. doi:10.48550/arXiv.2407.00644.

See Also

clusterpath_weights(), lasso_weights(), cggm(), cggm_refit()

Examples

Run this code
# \donttest{
# Generate data
set.seed(3)
Theta <- matrix(
  c(2, 1, 0, 0,
    1, 2, 0, 0,
    0, 0, 4, 1,
    0, 0, 1, 4),
  nrow = 4
)
X <- mvtnorm::rmvnorm(n = 100, sigma = solve(Theta))

# Use cross-validation to select the tuning parameters
fit_cv <- cggm_cv(
  X = X,
  tune_grid = expand.grid(
    phi = 1,
    k = 2,
    lambda_lasso = c(0, 0.02),
    lambda = seq(0, 0.2, by = 0.01)
  ),
  folds = cv_folds(nrow(X), 5)
)

# The best solution has 2 clusters
get_Theta(fit_cv)
get_clusters(fit_cv)
# }

Run the code above in your browser using DataLab