select_by_bma: Select Best Model via Bayesian Model Averaging

Description

Fits multiple bivariate hurdle models across a grid of lag orders and horseshoe hyperparameters, then performs model selection using LOO-CV and stacking weights.

Usage

select_by_bma(
  DT,
  spec = "C",
  controls = character(0),
  k_grid = 0:3,
  hs_grid = data.frame(hs_tau0 = c(0.1, 0.5, 1), hs_slab_scale = c(1, 5, 1, 5, 1, 5),
    hs_slab_df = 4, stringsAsFactors = FALSE),
  model = NULL,
  output_base_dir = NULL,
  iter_warmup = 900,
  iter_sampling = 1200,
  chains = 4,
  seed = 123,
  use_parallel = TRUE,
  verbose = TRUE
)

Value

A list with components:

fits: List of fitted model objects.
loos: List of LOO objects.
weights: Numeric vector of stacking weights.
table: Data.frame with results sorted by ELPD.

Arguments

DT: A data.table with the data.
spec: Character; model specification ("A", "B", "C", "D").
controls: Character vector of control variable names.
k_grid: Integer vector of lag orders to evaluate.
hs_grid: Data.frame with columns hs_tau0, hs_slab_scale, hs_slab_df defining the horseshoe hyperparameter grid.
model: A compiled CmdStan model. If NULL, loads the default.
output_base_dir: Base directory for output files. If NULL, uses tempdir().
iter_warmup: Integer; warmup iterations.
iter_sampling: Integer; sampling iterations.
chains: Integer; number of chains.
seed: Integer; random seed.
use_parallel: Logical; if TRUE and furrr is available, fits models in parallel.
verbose: Logical; print progress messages.

Examples

Run this code

# \donttest{
library(data.table)

# 1. Create a COMPLETE dummy dataset
# select_by_bma -> fit_one -> build_design requires ALL these columns:
DT <- data.table(
  year = 2000:2020,
  I = rpois(21, lambda = 4),
  C = rpois(21, lambda = 3),
  zI = rnorm(21),
  zC = rnorm(21),
  t_norm = seq(-1, 1, length.out = 21),
  t_poly2 = seq(-1, 1, length.out = 21)^2,
  Regime = factor(sample(c("A", "B"), 21, replace = TRUE)),
  trans_PS = sample(0:1, 21, replace = TRUE),
  trans_SF = sample(0:1, 21, replace = TRUE),
  trans_FC = sample(0:1, 21, replace = TRUE),
  log_exposure50 = rep(0, 21)
)

# 2. Run the function
# IMPORTANT: use_parallel = FALSE to avoid complexity/errors in CRAN checks
# We reduce the grid size (k_grid=0) for speed in this example
try({
  result <- select_by_bma(
    DT, 
    spec = "C", 
    k_grid = 0, 
    hs_grid = data.frame(hs_tau0=0.5, hs_slab_scale=1, hs_slab_df=4),
    use_parallel = FALSE,
    iter_warmup = 100, iter_sampling = 100, chains = 1 # Minimal MCMC for speed
  )
  
  if (!is.null(result$table)) {
    print(result$table)
  }
})
# }

Run the code above in your browser using DataLab