bdsvd.ht: Hyperparameter Tuning for BD-SVD

Description

Finds the number of non-zero elements of the sparse loading according to the high-dimensional Bayesian information criterion (HBIC).

Usage

bdsvd.ht(X, dof.lim, standardize = TRUE, anp = "2", max.iter)

Value

dof: The optimal number of nonzero components (degrees of freedom) according to the HBIC.
BIC: The HBIC for the different numbers of nonzero components.

Arguments

X: Data matrix of dimension \(n x p\) with possibly \(p >> n\).
dof.lim: Interval limits for the number of non-zero components in the sparse loading (degrees of freedom). If \(S\) denotes the support of \(v\), then the cardinality of the support, \(|S|\), corresponds to the degrees of freedom. Default is dof.lim <- c(0, p-1) which is highly recommended to check for all levels of sparsity.
standardize: Standardize the data to have unit variance. Default is TRUE.
anp: Which regularization function should be used for the HBIC. anp = "1" implements \(a_{np} = 1\) which corresponds to the BIC, anp = "2" implements \(a_{np} = 1/2 log(np)\) which corresponds to the regularization used by Bauer (2024), and anp = "3" implements \(a_{np} = log(log(np))\) which corresponds to the regularization used by Wang et al. (2009) and Wang et al. (2013).
max.iter: How many iterations should be performed for computing the sparse loading. Default is 200.

Details

The sparse loadings are computed using the method by Shen & Huang (2008), implemented in the irlba package. The computation of the HBIC is outlined in Bauer (2024).

References

Bauer, J.O. (2024). High-dimensional block diagonal covariance structure detection using singular vectors, J. Comput. Graph. Stat.

Shen, H. and Huang, J.Z. (2008). Sparse principal component analysis via regularized low rank matrix approximation, J. Multivar. Anal. 99, 1015–1034.

Wang, H., B. Li, and C. Leng (2009). Shrinkage tuning parameter selection with a diverging number of parameters, J. R. Stat. Soc. B 71 (3), 671–683.

Wang, L., Y. Kim, and R. Li (2013). Calibrating nonconvex penalized regression in ultra-high dimension, Ann. Stat. 41 (5), 2505–2536.

Examples

Run this code

#Replicate the illustrative example from Bauer (2024).


p <- 300 #Number of variables. In Bauer (2024), p = 3000
n <- 500 #Number of observations
b <- 3   #Number of blocks
design <- "c"

#Simulate data matrix X
set.seed(1)
Sigma <- bdsvd.cov.sim(p = p, b = b, design = design)
X <- mvtnorm::rmvnorm(n, mean = rep(0, p), sigma = Sigma)
colnames(X) <- seq_len(p)

ht <- bdsvd.ht(X)
plot(0:(p-1), ht$BIC[,1], xlab = "|S|", ylab = "HBIC", main = "", type = "l")
single.bdsvd(X, dof = ht$dof, standardize = FALSE)

Run the code above in your browser using DataLab