cv_sparseSCA
helps to find a range of Lasso and Group Lasso tuning parameters for the common component so as to generate sparse common component.
cv_sparseSCA(DATA, Jk, R, MaxIter, NRSTARTS, LassoSequence, GLassoSequence,
nfolds, method)
The concatenated data block, with rows representing subjects.
A vector. Each element of this vector is the number of columns of a data block.
The number of components (R>=2).
Maximum number of iterations for this algorithm. The default value is 400.
The number of multistarts for this algorithm. The default value is 1.
The range of Lasso tuning parameters. The default value is a sequence of 20 numbers from 0.00000001 to the smallest Lasso tuning parameter value that makes all the component loadings equal to zero. Note that by default the 50 numbers are equally spaced on the log scale.
The range of Group Lasso tuning parameters. The default value is a sequence of 20 numbers from 0.00000001
to the smallest Group Lasso tuning parameter value that makes all the component loadings equal to zero. Note that by default the 50 numbers are equally spaced (but not on the log scale).
Note that if LassoSequence
contains only one number, then by default GLassoSequence
is a sequence of 50 values.
Number of folds. If missing, then 10 fold cross-validation will be performed.
"datablock" or "component". These are two options with respect to the grouping of the loadings as used in the Group Lasso penalty.
If method="component"
, the block-grouping of the coefficients is applied per component separately. If method = "datablock"
, the grouping
is applied on the concatenated data block, with loadings of all components together. If method
is missing, then the "component" method is used
by default.
A matrix of mean squared predition error (MSPE) for the sequences of Lasso and Group Lasso tuning parameters.
A matrix of standard errors for MSPE
.
The lowest MSPE + 1SE.
A matrix of number of variables selected for the sequences of Lasso and Group Lasso tuning parameters.
The sequence of Lasso tuning parameters used for cross-validation. Users may also consult Lambdaregion
(explained below).
The sequence of Group Lasso tuning parameters used for cross-validation. For example, suppose from the plot we found that the index number
for Group Lasso is 6
, its corresponding Group Lasso tuning parameter is Glasso_values[6]
.
A region of proper tuning parameter values for Lasso, given a certain value for Group Lasso. This means that, for example, if 5 Group Lasso tuning parameter values have been considered, Lambdaregion
is a 5 by 2 matrix.
A pair (or sometimes a few pairs) of Lasso and Group Lasso tuning parameters that lead to a model with MSPE closest to the lowest MSPE + 1SE.
Estimated component loading matrix, given the recommended tuning parameters.
Estimated component score matrix, given the recommended tuning parameters.
An index number for function plot
, which is not useful for users.
This function searches through a range of Lasso and Group Lasso tuning parameters for identifying common and distinctive components
Witten, D.M., Tibshirani, R., & Hastie, T. (2009), A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics, 10(3), 515-534.
Friedman, J., Hastie, T., & Tibshirani, R. (2010). A note on the group lasso and a sparse group lasso. arXiv preprint arXiv:1001.0736.
Yuan, M., & Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(1), 49-67.
# NOT RUN {
DATA1 <- matrix(rnorm(50), nrow=5)
DATA2 <- matrix(rnorm(100), nrow=5)
DATA <- cbind(DATA1, DATA2)
Jk <- c(10, 20)
cv_sparseSCA(DATA, Jk, R=5, MaxIter = 100, NRSTARTS = 40, nfolds=10)
# }
Run the code above in your browser using DataLab