BMI_LASSO: Bayesian MI-LASSO for Multiply-Imputed Regression

Description

Fit a Bayesian multiple-imputation LASSO (BMI-LASSO) model across multiply-imputed datasets, using one of four priors: Multi-Laplace, Horseshoe, ARD, or Spike-Laplace. Automatically standardizes data, runs MCMC in parallel, performs variable selection via three-step projection predictive variable selection, and selects a final submodel by BIC.

Usage

BMI_LASSO(
  X,
  Y,
  model,
  standardize = TRUE,
  SNC = TRUE,
  grid = seq(0, 1, 0.01),
  orthogonal = FALSE,
  nburn = 4000,
  npost = 4000,
  seed = NULL,
  nchain = 1,
  ncores = 1,
  verbose = TRUE,
  printevery = 1000,
  ...
)

Value

A named list with elements:

posterior: List of length nchain of MCMC outputs (posterior draws).
select: List of length nchain of logical matrices showing which variables are selected at each grid value.
best_select: List of length nchain of the single best selection (by BIC) for each chain.
posterior_best_models: List of length nchain of projected posterior draws for the best submodel.
bic_models: List of length nchain of BIC values and degrees-of-freedom for each candidate submodel.
summary_table_full: A data frame summarizing rank-normalized split-Rhat and other diagnostics for the full model.
summary_table_selected: A data frame summarizing diagnostics for the selected submodel after projection.

Arguments

X

A numeric matrix or array of predictors. If a matrix n × p, it is taken as one imputation; if an array D × n × p, each slice along the first dimension is one imputed dataset.

Y

A numeric vector or matrix of outcomes. If a vector of length n, it is recycled for each imputation; if a D × n matrix, each row is the response for one imputation.

model

Character; which prior to use. One of "Multi_Laplace", "Horseshoe", "ARD", or "Spike_Laplace".

standardize

Logical; whether to normalize each X and centralize Y within each imputation before fitting. Default TRUE.

SNC

Logical; if TRUE, use scaled neighborhood criterion; otherwise apply thresholding or median‐based selection. Default TRUE.

grid

Numeric vector; grid of scaled neighborhood criterion (or thresholding) to explore. Default seq(0,1,0.01).

orthogonal

Logical; if TRUE, using orthogonal approximations for degrees‐of‐freedom estimations. Default FALSE.

nburn

Integer; number of burn-in MCMC iterations per chain. Default 4000.

npost

Integer; number of post-burn-in samples to retain per chain. Default 4000.

seed

Optional integer; base random seed. Each chain adds its index.

nchain

Integer; number of MCMC chains to run in parallel. Default 1.

ncores

Integer; number of parallel cores to use. Default 1.

verbose

Logical; print progress messages. Default TRUE.

printevery

Integer; print status every so many iterations. Default 1000.

...

Additional model-specific hyperparameters:

For "Multi_Laplace": h (shape) and v (scale) of Gamma hyperprior.
For "Spike_Laplace": a (shape) and b (scale) of Gamma hyperprior.

Examples

Run this code

sim <- sim_A(n = 100, p = 20, type = "MAR", SNP = 1.5, low_missing = TRUE, n_imp = 5, seed = 123)
X <- sim$data_MI$X
Y <- sim$data_MI$Y
fit <- BMI_LASSO(X, Y, model = "Horseshoe",
                 nburn = 100, npost = 100,
                 nchain = 1, ncores = 1)
str(fit$best_select)

Run the code above in your browser using DataLab