geomc.vs uses a geometric approach to MCMC for performing Bayesian variable selection. It produces MCMC samples from the posterior density of a Bayesian hierarchical feature selection model.
geomc.vs(
X,
y,
initial = NULL,
n.iter = 50,
burnin = 1,
eps = 0.5,
symm = TRUE,
move.prob = c(0.4, 0.4, 0.2),
lam0 = 0,
a0 = 0,
b0 = 0,
lam = nrow(X)/ncol(X)^2,
w = sqrt(nrow(X))/ncol(X),
model.summary = FALSE,
model.threshold = 0.5
)
A list with components
MCMC samples from \(P(\gamma|y)\) returned as a \(p \times\)n.iter sparse lgCMatrix
.
acceptance.rate
The acceptance rate based on all samples.
mip
The \(p\) vector of marginal inclusion probabilities of all variables based on post burnin samples.
median.model
The median probability model based on post burnin samples.
beta.mean
The Monte Carlo estimate of posterior mean of \(\beta\) (the \(p+1\) vector c(intercept, regression coefficients)) based on post burnin samples.
wmip
The \(p\) vector of weighted marginal inclusion probabilities of all variables based on post burnin samples.
wam
The weighted average model based on post burnin samples.
beta.wam
The model probability weighted estimate of posterior mean of \(\beta\) (the \(p+1\) vector c(intercept, regression coefficients)) based on post burnin samples.
log.post
The n.iter vector of log of the unnormalized marginal posterior pmf \(P(\gamma|y)\) evaluated at the samples.
The \(n\times p\) covariate matrix without intercept. The following classes are supported:
matrix
and dgCMatrix
. No need to center or scale this matrix manually. Scaling is performed implicitly and
regression coefficients are returned on the original scale.
The response vector of length \(n\). No need to center or scale.
is the initial model (the set of active variables). Default: Null model.
is the no. of samples needed. Default: 50.
is the value of burnin used to compute the median probability model. Default: 1.
is the value for epsilon perturbation. Default: 0.5.
indicates if the base density is of symmetric RW-MH. Default: True.
is the vector of ('addition', 'deletion', 'swap') move probabilities. Default: (0.4,0.4,0.2). move.prob is used only when symm is set to False.
The precision parameter for \(\beta_0\). Default: 0 (corresponding to improper uniform prior).
The shape parameter for prior on \(\sigma^2\). Default: 0.
The scale parameter for prior on \(\sigma^2\). Default: 0.
The slab precision parameter. Default: \(n/p^2\) as suggested by the theoretical results of Li, Dutta, Roy (2023).
The prior inclusion probability of each variable. Default: \(\sqrt{n}/p\).
If true, additional summaries are returned. Default: FALSE.
The threshold probability to select the covariates for the median model (median.model) and the weighted average model (wam). A covariate will be included in median.model (wam) if its marginal inclusion probability (weighted marginal inclusion probability) is greater than the threshold. Default: 0.5.
Vivekananda Roy
geomc.vs provides MCMC samples using the geometric MH algorithm of Roy (2024) for variable selection based on a hierarchical Gaussian linear model with priors placed on the regression coefficients as well as on the model space as follows: $$y | X, \beta_0,\beta,\gamma,\sigma^2,w,\lambda \sim N(\beta_01 + X_\gamma\beta_\gamma,\sigma^2I_n)$$ $$\beta_i|\beta_0,\gamma,\sigma^2,w,\lambda \stackrel{indep.}{\sim} N(0, \gamma_i\sigma^2/\lambda),~i=1,\ldots,p,$$ $$\beta_0|\gamma,\sigma^2,w,\lambda \sim N(0, \sigma^2/\lambda_0)$$ $$\sigma^2|\gamma,w,\lambda \sim Inv-Gamma (a_0, b_0)$$ $$\gamma_i|w,\lambda \stackrel{iid}{\sim} Bernoulli(w)$$ where \(X_\gamma\) is the \(n \times |\gamma|\) submatrix of \(X\) consisting of those columns of \(X\) for which \(\gamma_i=1\) and similarly, \(\beta_\gamma\) is the \(|\gamma|\) subvector of \(\beta\) corresponding to \(\gamma\). The density \(\pi(\sigma^2)\) of \(\sigma^2 \sim Inv-Gamma (a_0, b_0)\) has the form \(\pi(\sigma^2) \propto (\sigma^2)^{-a_0-1} \exp(-b_0/\sigma^2)\). The functions in the package also allow the non-informative prior \((\beta_{0}, \sigma^2)|\gamma, w \sim 1 / \sigma^{2}\) which is obtained by setting \(\lambda_0=a_0=b_0=0\). geomc.vs provides the empirical MH acceptance rate and MCMC samples from the posterior pmf of the models \(P(\gamma|y)\), which is available up to a normalizing constant. If \(\code{model.summary}\) is set TRUE, geomc.vs also returns other model summaries. In particular, it returns the marginal inclusion probabilities (mip) computed by the Monte Carlo average as well as the weighted marginal inclusion probabilities (wmip) computed with weights $$w_i = P(\gamma^{(i)}|y)/\sum_{k=1}^K P(\gamma^{(k)}|y), i=1,2,...,K$$ where \(\gamma^{(k)}, k=1,2,...,K\) are the distinct models sampled. Thus, if \(N_k\) is the no. of times the \(k\)th distinct model \(\gamma^{(k)}\) is repeated in the MCMC samples, the mip for the \(j\)th variable is $$mip_j = \sum_{k=1}^{K} N_k I(\gamma^{(k)}_j = 1)/n.iter$$ and wmip for the \(j\)th variable is $$wmip_j = \sum_{k=1}^K w_k I(\gamma^{(k)}_j = 1).$$ The median.model is the model containing variables \(j\) with \(mip_j > \code{model.threshold}\) and the wam is the model containing variables \(j\) with \(wmip_j > \code{model.threshold}\). Note that \(E(\beta|\gamma, y)\), the conditional posterior mean of \(\beta\) given a model \(\gamma\) is available in closed form (see Li, Dutta, Roy (2023) for details). geomc.vs returns two estimates (beta.mean, beta.wam) of the posterior mean of \(\beta\) computed as $$ beta.mean = \sum_{k=1}^{K} N_k E(\beta|\gamma^{(k)},y)/n.iter$$ and $$beta.wam = \sum_{k=1}^K w_k E(\beta|\gamma^{(k)},y),$$ respectively.
Roy, V.(2024) A geometric approach to informative MCMC sampling https://arxiv.org/abs/2406.09010
Li, D., Dutta, S., Roy, V.(2023) Model Based Screening Embedded Bayesian Variable Selection for Ultra-high Dimensional Settings, Journal of Computational and Graphical Statistics, 32, 61-73
n=50; p=100; nonzero = 3
trueidx <- 1:3
nonzero.value <- 4
TrueBeta <- numeric(p)
TrueBeta[trueidx] <- nonzero.value
rho <- 0.5
xone <- matrix(rnorm(n*p), n, p)
X <- sqrt(1-rho)*xone + sqrt(rho)*rnorm(n)
y <- 0.5 + X %*% TrueBeta + rnorm(n)
result <- geomc.vs(X=X, y=y,model.summary = TRUE)
result$samples # the MCMC samples
result$acceptance.rate #the acceptance.rate
result$mip #marginal inclusion probabilities
result$wmip #weighted marginal inclusion probabilities
result$median.model #the median.model
result$wam #the weighted average model
result$beta.mean #the posterior mean of regression coefficients
result$beta.wam #another estimate of the posterior mean of regression coefficients
result$log.post #the log (unnormalized) posterior probabilities of the MCMC samples.
Run the code above in your browser using DataLab