fit_horseshoe: Fit Regularized Horseshoe Regression Model

Description

Fits a Bayesian linear regression with regularized Horseshoe prior using Stan via cmdstanr. This version includes improved numerical stability and automatic prior calibration.

Usage

fit_horseshoe(
  y,
  X,
  var_names = NULL,
  p0 = NULL,
  slab_scale = 3,
  slab_df = 4,
  tau_scale = NULL,
  use_qr = FALSE,
  standardize = TRUE,
  X_new = NULL,
  iter_warmup = 1000,
  iter_sampling = 1000,
  chains = 4,
  adapt_delta = 0.95,
  max_treedepth = 12,
  seed = 123,
  verbose = TRUE
)

Value

A list of class "signaly_horseshoe" with posterior summaries, diagnostics, and model fit object.

Arguments

y: Numeric vector of the response variable.
X: Matrix or data frame of predictor variables.
var_names: Optional character vector of variable names.
p0: Expected number of non-zero coefficients. Default: P/3.
slab_scale: Scale for the regularizing slab. Default: 3.
slab_df: Degrees of freedom for the slab. Default: 4.
tau_scale: Scale multiplier for the global shrinkage prior. Default: NULL (auto-calibrated based on data characteristics). Increase this value (e.g., 10-20) if the model over-shrinks.
use_qr: Use QR decomposition? Default: FALSE.
standardize: Standardize predictors internally? Default: TRUE.
X_new: Optional matrix for out-of-sample prediction.
iter_warmup: Warmup iterations per chain. Default: 1000.
iter_sampling: Sampling iterations per chain. Default: 1000.
chains: Number of MCMC chains. Default: 4.
adapt_delta: Target acceptance probability. Default: 0.95.
max_treedepth: Maximum tree depth. Default: 12.
seed: Random seed.
verbose: Print progress messages?

Details

The regularized Horseshoe prior (Piironen & Vehtari, 2017) provides adaptive shrinkage that can distinguish between relevant and irrelevant predictors.

Variable Selection Methods:

After fitting, variables can be selected using different criteria:

select_by_credible_interval: Selects variables whose credible interval excludes zero. Recommended - most robust method.
select_by_shrinkage: Selects based on kappa (shrinkage factor). May underselect when tau is very small.
select_by_magnitude: Selects based on coefficient magnitude.

Note on kappa-based selection:

The shrinkage factor kappa depends on the global shrinkage parameter tau. In some datasets, the posterior of tau may concentrate near zero, causing all kappa values to be close to 1 even for truly relevant variables. When this happens, the coefficient estimates (beta) remain valid, but kappa-based selection will fail. The function automatically warns when this occurs and recommends using select_by_credible_interval() instead.