Learn R Programming

bspcov (version 1.0.3)

sbmspcov: Bayesian Sparse Covariance Estimation using Sure Screening

Description

Provides a Bayesian sparse and positive definite estimate of a covariance matrix via screened beta-mixture prior.

Usage

sbmspcov(
  X,
  Sigma,
  cutoff = list(),
  prior = list(),
  nsample = list(),
  nchain = 1,
  seed = NULL,
  do.parallel = FALSE,
  show_progress = TRUE
)

Value

Sigma

a nmc \(\times\) p(p+1)/2 matrix including lower triangular elements of covariance matrix. For multiple chains, this becomes a list of matrices.

p

dimension of covariance matrix.

Phi

a nmc \(\times\) p(p+1)/2 matrix including shrinkage parameters corresponding to lower triangular elements of covariance matrix. For multiple chains, this becomes a list of matrices.

INDzero

a list including indices of off-diagonal elements screened by sure screening. For multiple chains, this becomes a list of lists.

cutoff

the cutoff value specified by FNR-approach. For multiple chains, this becomes a list.

mcmctime

elapsed time for MCMC sampling. For multiple chains, this becomes a list.

nchain

number of chains used.

burnin

number of burn-in samples discarded.

nmc

number of MCMC samples retained for analysis.

Arguments

X

a n \(\times\) p data matrix with column mean zero.

Sigma

an initial guess for Sigma.

cutoff

a list giving the information for the threshold. The list includes the following parameters (with default values in parentheses): method ('FNR') giving the method for determining the threshold value (method='FNR' uses the false negative rate (FNR)-based approach, method='corr' chooses the threshold value by sample correlations), rho a lower bound of meaningfully large correlations whose the defaults values are 0.25 and 0.2 for method = 'FNR' and method = 'corr', respectively. Note. If method='corr', rho is used as the threshold value. FNR (0.05) giving the prespecified FNR level when method = 'FNR'. nsimdata (1000) giving the number of simulated datasets for calculating Jeffreys’ default Bayes factors when method = 'FNR'.

prior

a list giving the prior information. The list includes the following parameters (with default values in parentheses): a (1/2) and b (1/2) giving the shape parameters for beta distribution, lambda (1) giving the hyperparameter for the diagonal elements, tau1sq (log(p)/(p^2*n)) giving the hyperparameter for the shrinkage prior of covariance.

nsample

a list giving the MCMC parameters. The list includes the following integers (with default values in parentheses): burnin (1000) giving the number of MCMC samples in transition period, nmc (1000) giving the number of MCMC samples for analysis.

nchain

number of MCMC chains to run. Default is 1.

seed

random seed for reproducibility. If NULL, no seed is set. For multiple chains, each chain gets seed + i - 1.

do.parallel

logical indicating whether to run multiple chains in parallel using future.apply. Default is FALSE. When TRUE, automatically sets up a multisession plan with nchain workers if no parallel plan is already configured.

show_progress

logical indicating whether to show progress bars during MCMC sampling. Default is TRUE. Automatically set to FALSE for individual chains when running in parallel.

Author

Kyoungjae Lee, Seongil Jo, and Kyeongwon Lee

Details

Lee, Jo, Lee, and Lee (2023+) proposed the screened beta-mixture shrinkage prior for estimating a sparse and positive definite covariance matrix. The screened beta-mixture shrinkage prior for \(\Sigma = (\sigma_{jk})\) is defined as $$ \pi(\Sigma) = \frac{\pi^{u}(\Sigma)I(\Sigma \in C_p)}{\pi^{u}(\Sigma \in C_p)}, ~ C_p = \{\mbox{ all } p \times p \mbox{ positive definite matrices }\}, $$ where \(\pi^{u}(\cdot)\) is the unconstrained prior given by $$ \pi^{u}(\sigma_{jk} \mid \psi_{jk}) = N\left(\sigma_{jk} \mid 0, \frac{\psi_{jk}}{1 - \psi_{jk}}\tau_1^2\right), ~ \psi_{jk} = 1 - 1/(1 + \phi_{jk})$$ $$ \pi^{u}(\psi_{jk}) = Beta(\psi_{jk} \mid a, b) ~ \mbox{for } (j, k) \in S_r(\hat{R})$$ $$ \pi^{u}(\sigma_{jj}) = Exp(\sigma_{jj} \mid \lambda), $$ where \(S_r(\hat{R}) = \{(j,k) : 1 < j < k \le p, |\hat{\rho}_{jk}| > r\}\), \(\hat{\rho}_{jk}\) are sample correlations, and \(r\) is a threshold given by user.

For more details, see Lee, Jo, Lee and Lee (2022+).

References

Lee, K., Jo, S., Lee, K., and Lee, J. (2023+), "Scalable and optimal Bayesian inference for sparse covariance matrices via screened beta-mixture prior", arXiv:2206.12773.

See Also

bmspcov

Examples

Run this code

set.seed(1)
n <- 20
p <- 5

# generate a sparse covariance matrix:
True.Sigma <- matrix(0, nrow = p, ncol = p)
diag(True.Sigma) <- 1
Values <- -runif(n = p*(p-1)/2, min = 0.2, max = 0.8)
nonzeroIND <- which(rbinom(n=p*(p-1)/2,1,prob=1/p)==1)
zeroIND = (1:(p*(p-1)/2))[-nonzeroIND]
Values[zeroIND] <- 0
True.Sigma[lower.tri(True.Sigma)] <- Values
True.Sigma[upper.tri(True.Sigma)] <- t(True.Sigma)[upper.tri(True.Sigma)]
if(min(eigen(True.Sigma)$values) <= 0){
  delta <- -min(eigen(True.Sigma)$values) + 1.0e-5
  True.Sigma <- True.Sigma + delta*diag(p)
}

# generate a data
X <- MASS::mvrnorm(n = n, mu = rep(0, p), Sigma = True.Sigma)

# compute sparse, positive covariance estimator:
fout <- bspcov::sbmspcov(X = X, Sigma = diag(diag(cov(X))))
post.est.m <- bspcov::estimate(fout)
sqrt(mean((post.est.m - True.Sigma)^2))
sqrt(mean((cov(X) - True.Sigma)^2))

# Multiple chains example with parallel processing:
# fout_multi <- bspcov::sbmspcov(X = X, Sigma = diag(diag(cov(X))), 
#                                nchain = 4, seed = 123, do.parallel = TRUE)
# post.est.multi <- bspcov::estimate(fout_multi)
# summary(fout_multi, cols = 1, rows = 1:3)  # Shows MCMC diagnostics
# plot(fout_multi, cols = 1, rows = 1:3)     # Shows colored trace plots
# When do.parallel = TRUE, the function automatically sets up 
# a multisession plan with nchain workers for parallel execution.

Run the code above in your browser using DataLab