cv_sparseSCA: A K-fold cross-validation procedure when common/distinctive processes are unknown with Lasso and Group Lasso penalties.

Description

cv_sparseSCA helps to find a range of Lasso and Group Lasso tuning parameters for the common component so as to generate sparse common component.

Usage

cv_sparseSCA(DATA, Jk, R, MaxIter, NRSTARTS, LassoSequence, GLassoSequence,
  nfolds, method)

Arguments

DATA

The concatenated data block, with rows representing subjects.

A vector. Each element of this vector is the number of columns of a data block.

The number of components (R>=2).

MaxIter

Maximum number of iterations for this algorithm. The default value is 400.

NRSTARTS

The number of multistarts for this algorithm. The default value is 1.

LassoSequence

The range of Lasso tuning parameters. The default value is a sequence of 20 numbers from 0.00000001 to the smallest Lasso tuning parameter value that makes all the component loadings equal to zero. Note that by default the 50 numbers are equally spaced on the log scale.

GLassoSequence

The range of Group Lasso tuning parameters. The default value is a sequence of 20 numbers from 0.00000001 to the smallest Group Lasso tuning parameter value that makes all the component loadings equal to zero. Note that by default the 50 numbers are equally spaced (but not on the log scale). Note that if LassoSequence contains only one number, then by default GLassoSequence is a sequence of 50 values.

nfolds

Number of folds. If missing, then 10 fold cross-validation will be performed.

method

"datablock" or "component". These are two options with respect to the grouping of the loadings as used in the Group Lasso penalty. If method="component", the block-grouping of the coefficients is applied per component separately. If method = "datablock", the grouping is applied on the concatenated data block, with loadings of all components together. If method is missing, then the "component" method is used by default.

Value

MSPE

A matrix of mean squared predition error (MSPE) for the sequences of Lasso and Group Lasso tuning parameters.

SE_MSE

A matrix of standard errors for MSPE.

MSPE1SE

The lowest MSPE + 1SE.

VarSelected

A matrix of number of variables selected for the sequences of Lasso and Group Lasso tuning parameters.

Lasso_values

The sequence of Lasso tuning parameters used for cross-validation. Users may also consult Lambdaregion (explained below).

Glasso_values

The sequence of Group Lasso tuning parameters used for cross-validation. For example, suppose from the plot we found that the index number for Group Lasso is 6, its corresponding Group Lasso tuning parameter is Glasso_values[6].

Lambdaregion

A region of proper tuning parameter values for Lasso, given a certain value for Group Lasso. This means that, for example, if 5 Group Lasso tuning parameter values have been considered, Lambdaregion is a 5 by 2 matrix.

RecommendedLambda

A pair (or sometimes a few pairs) of Lasso and Group Lasso tuning parameters that lead to a model with MSPE closest to the lowest MSPE + 1SE.

P_hat

Estimated component loading matrix, given the recommended tuning parameters.

T_hat

Estimated component score matrix, given the recommended tuning parameters.

plotlog

An index number for function plot, which is not useful for users.

Details

This function searches through a range of Lasso and Group Lasso tuning parameters for identifying common and distinctive components

References

Witten, D.M., Tibshirani, R., & Hastie, T. (2009), A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics, 10(3), 515-534.

Friedman, J., Hastie, T., & Tibshirani, R. (2010). A note on the group lasso and a sparse group lasso. arXiv preprint arXiv:1001.0736.

Yuan, M., & Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(1), 49-67.

Examples

Run this code

# NOT RUN {
DATA1 <- matrix(rnorm(50), nrow=5)
DATA2 <- matrix(rnorm(100), nrow=5)  
DATA <- cbind(DATA1, DATA2)
Jk <- c(10, 20) 
cv_sparseSCA(DATA, Jk, R=5, MaxIter = 100, NRSTARTS = 40, nfolds=10)
# }

Run the code above in your browser using DataLab