Learn R Programming

MicrobiomeStat (version 1.4)

cscca.CV: Compositional Sparse Canonical Correlation Analysis (Cross Valication Version)

Description

The cross validation version of a compositional sparse canonical correlation analysis (sCCA) framework for integrating microbiome data with other high-dimensional omics data.

Usage

cscca.CV(
  Y,
  View.ind,
  View.type = NULL,
  eps.stop = 1e-04,
  max.step = 30,
  eps = 1e-04,
  T.step = 10,
  n_fold = 5,
  seed.sam.ind = NULL,
  hp.lower = NULL,
  hp.upper = NULL,
  hp.eta.lower = NULL,
  hp.eta.upper = NULL,
  eta.warm.stat.mat = NULL,
  opt_n_design = 30,
  opt_n_iter = 20,
  Criterion = "cov",
  des.init = NULL,
  is.refit = F,
  is.refix.eta = T,
  opt_n_design.eta_warm = 30,
  opt_n_iter.eta_warm = 20,
  is.opt.hyper = F,
  hyper_n_grid = 20,
  ...
)

Value

A list containing the following elements: (1) a.hat.opt.trgt: The coefficient vector estimated with the optimal hyper-parameter vector; (2) lam.opt.trgt: The optimal hyper-parameter vector.

Arguments

Y

a n*(K*p) matrix representing the observations.

View.ind

a (K*p) integer vector indicating the classes of features. The features with the same View.ind is in the same class.

View.type

a K vector encoding the structure type of each feature class. There are two choices: "O" (Omics Data),"C" (Compositional Data).

eps.stop

a numerical value controlling the convergence.

max.step

an integer controlling the maximum step for interaction.

eps

a numerical value controlling the convergence.

T.step

an integer controlling the maximum step for interaction.

n_fold

an integer representing the number of folds for cross validation.

seed.sam.ind

a vector of the seeds for sampling.

hp.lower

a numerical value or K vector specifying the lower bound of the hyper-parameter.

hp.upper

a numerical value or K vector specifying the upper bound of the hyper-parameter.

hp.eta.lower

a numerical value or K vector specifying the lower bound of the hyper-parameter for eta.

hp.eta.upper

a numerical value or K vector specifying the upper bound of the hyper-parameter for eta.

eta.warm.stat.mat

a matrix providing statistics for warm start of eta.

opt_n_design

an integer controlling the number of design points in the hyperparameter optimization.

opt_n_iter

an integer controlling the number of iterations in the hyperparameter optimization.

Criterion

a character indicating the criterion we choose for cross validation.

des.init

an initial design for hyperparameter optimization.

is.refit

a bool suggesting whether to refit the model using the optimal hyper-parameters.

is.refix.eta

a bool suggesting whether eta is fixed during refitting.

opt_n_design.eta_warm

an integer controlling the number of design points for eta warm-start optimization.

opt_n_iter.eta_warm

an integer controlling the number of iterations for eta warm-start optimization.

is.opt.hyper

a bool suggesting whether to optimize the hyper-parameters.

hyper_n_grid

an integer controlling the grid size for hyperparameter search.

...

additional arguments passed to the internal optimization procedures.

References

1. Deng, L., Tang, Y., Zhang, X., et al. (2024). Structure-adaptive canonical correlation analysis for microbiome multi-omics data. Frontiers in Genetics, 15, 1489694.

2. Chen, J., Bushman, F. D., Lewis, J. D., et al. (2013). Structure-constrained sparse canonical correlation analysis with an application to microbiome data analysis. Biostatistics, 14(2), 244–258.

Examples

Run this code
if (FALSE) {
library(dplyr)

n <- 200
p <- q <- 100
sigma.nu <- 5
sigma.eps <- 1
omega_X <- 0.85*c(rep(1/10,9),-9/10,rep(0,p-10))
omega_Y <- 0.85*c(seq(0.08,0.12,length = 10),rep(0,q-10))
Data1 <- DGP_OC(seed=10,n,p,q,sigma.nu,sigma.eps,omega_X,omega_Y)

library(mlrMBO)
Res.sCCA.CV <- cscca.CV(Y=Data1$Y,View.ind=Data1$View.ind,
                          View.type=c("O","O"),
                          show.info = TRUE)


Res.CsCCA.CV <- cscca.CV(Y=Data1$Y,View.ind=Data1$View.ind,
                                   View.type=c("O","C"),
                                   show.info = TRUE)

Res.sCCA <- cscca(Y=Data1$Y,View.ind=Data1$View.ind,
                     lambda.seq=Res.sCCA.CV$lam.opt.trgt,
                     View.type=c("O","O"))
Res.CsCCA <- cscca(Y=Data1$Y,View.ind=Data1$View.ind,
                     lambda.seq=Res.CsCCA.CV$lam.opt.trgt,
                     View.type=c("O","C"))
print(Res.sCCA.CV$Cri.opt.trgt)
print(Res.CsCCA.CV$Cri.opt.trgt)
}

Run the code above in your browser using DataLab