cscca.CV: Compositional Sparse Canonical Correlation Analysis (Cross Valication Version)

Description

The cross validation version of a compositional sparse canonical correlation analysis (sCCA) framework for integrating microbiome data with other high-dimensional omics data.

Usage

cscca.CV(
  Y,
  View.ind,
  View.type = NULL,
  eps.stop = 1e-04,
  max.step = 30,
  eps = 1e-04,
  T.step = 10,
  n_fold = 5,
  seed.sam.ind = NULL,
  hp.lower = NULL,
  hp.upper = NULL,
  hp.eta.lower = NULL,
  hp.eta.upper = NULL,
  eta.warm.stat.mat = NULL,
  opt_n_design = 30,
  opt_n_iter = 20,
  Criterion = "cov",
  des.init = NULL,
  is.refit = F,
  is.refix.eta = T,
  opt_n_design.eta_warm = 30,
  opt_n_iter.eta_warm = 20,
  is.opt.hyper = F,
  hyper_n_grid = 20,
  ...
)

Value

A list containing the following elements: (1) a.hat.opt.trgt: The coefficient vector estimated with the optimal hyper-parameter vector; (2) lam.opt.trgt: The optimal hyper-parameter vector.

Arguments

Y: a n*(K*p) matrix representing the observations.
View.ind: a (K*p) integer vector indicating the classes of features. The features with the same View.ind is in the same class.
View.type: a K vector encoding the structure type of each feature class. There are two choices: "O" (Omics Data),"C" (Compositional Data).
eps.stop: a numerical value controlling the convergence.
max.step: an integer controlling the maximum step for interaction.
eps: a numerical value controlling the convergence.
T.step: an integer controlling the maximum step for interaction.
n_fold: an integer representing the number of folds for cross validation.
seed.sam.ind: a vector of the seeds for sampling.
hp.lower: a numerical value or K vector specifying the lower bound of the hyper-parameter.
hp.upper: a numerical value or K vector specifying the upper bound of the hyper-parameter.
hp.eta.lower: a numerical value or K vector specifying the lower bound of the hyper-parameter for eta.
hp.eta.upper: a numerical value or K vector specifying the upper bound of the hyper-parameter for eta.
eta.warm.stat.mat: a matrix providing statistics for warm start of eta.
opt_n_design: an integer controlling the number of design points in the hyperparameter optimization.
opt_n_iter: an integer controlling the number of iterations in the hyperparameter optimization.
Criterion: a character indicating the criterion we choose for cross validation.
des.init: an initial design for hyperparameter optimization.
is.refit: a bool suggesting whether to refit the model using the optimal hyper-parameters.
is.refix.eta: a bool suggesting whether eta is fixed during refitting.
opt_n_design.eta_warm: an integer controlling the number of design points for eta warm-start optimization.
opt_n_iter.eta_warm: an integer controlling the number of iterations for eta warm-start optimization.
is.opt.hyper: a bool suggesting whether to optimize the hyper-parameters.
hyper_n_grid: an integer controlling the grid size for hyperparameter search.
...: additional arguments passed to the internal optimization procedures.

References

1. Deng, L., Tang, Y., Zhang, X., et al. (2024). Structure-adaptive canonical correlation analysis for microbiome multi-omics data. Frontiers in Genetics, 15, 1489694.

2. Chen, J., Bushman, F. D., Lewis, J. D., et al. (2013). Structure-constrained sparse canonical correlation analysis with an application to microbiome data analysis. Biostatistics, 14(2), 244–258.

Examples

Run this code

if (FALSE) {
library(dplyr)

n <- 200
p <- q <- 100
sigma.nu <- 5
sigma.eps <- 1
omega_X <- 0.85*c(rep(1/10,9),-9/10,rep(0,p-10))
omega_Y <- 0.85*c(seq(0.08,0.12,length = 10),rep(0,q-10))
Data1 <- DGP_OC(seed=10,n,p,q,sigma.nu,sigma.eps,omega_X,omega_Y)

library(mlrMBO)
Res.sCCA.CV <- cscca.CV(Y=Data1$Y,View.ind=Data1$View.ind,
                          View.type=c("O","O"),
                          show.info = TRUE)


Res.CsCCA.CV <- cscca.CV(Y=Data1$Y,View.ind=Data1$View.ind,
                                   View.type=c("O","C"),
                                   show.info = TRUE)

Res.sCCA <- cscca(Y=Data1$Y,View.ind=Data1$View.ind,
                     lambda.seq=Res.sCCA.CV$lam.opt.trgt,
                     View.type=c("O","O"))
Res.CsCCA <- cscca(Y=Data1$Y,View.ind=Data1$View.ind,
                     lambda.seq=Res.CsCCA.CV$lam.opt.trgt,
                     View.type=c("O","C"))
print(Res.sCCA.CV$Cri.opt.trgt)
print(Res.CsCCA.CV$Cri.opt.trgt)
}

Run the code above in your browser using DataLab