htlr_fit: Fit a HTLR Model (Internal API)

Description

This function trains linear logistic regression models with HMC in restricted Gibbs sampling. It also makes predictions for test cases if X_ts are provided.

Usage

htlr_fit(
  X_tr,
  y_tr,
  fsel = 1:ncol(X_tr),
  stdzx = TRUE,
  ptype = c("t", "ghs", "neg"),
  sigmab0 = 2000,
  alpha = 1,
  s = -10,
  eta = 0,
  iters_h = 1000,
  iters_rmc = 1000,
  thin = 1,
  leap_L = 50,
  leap_L_h = 5,
  leap_step = 0.3,
  hmc_sgmcut = 0.05,
  initial_state = "lasso",
  keep.warmup.hist = FALSE,
  silence = TRUE,
  rep.legacy = TRUE,
  alpha.rda = 0.2,
  lasso.lambda = seq(0.05, 0.01, by = -0.01),
  X_ts = NULL,
  predburn = NULL,
  predthin = 1
)

Value

A list of fitting results. If X_ts is not provided, the list is an object with S3 class htlr.fit.

Arguments

X_tr

Input matrix, of dimension nobs by nvars; each row is an observation vector.

y_tr

Vector of response variables. Must be coded as non-negative integers, e.g., 1,2,...,C for C classes, label 0 is also allowed.

fsel

Subsets of features selected before fitting, such as by univariate screening.

stdzx

Logical; if TRUE, the original feature values are standardized to have mean = 0 and sd = 1.

ptype

The prior to be applied to the model. Either "t" (student-t, default), "ghs" (horseshoe), or "neg" (normal-exponential-gamma).

sigmab0

The sd of the normal prior for the intercept.

alpha

The degree freedom of t/ghs/neg prior for coefficients.

s

The log scale of priors (logw) for coefficients.

eta

The sd of the normal prior for logw. When it is set to 0, logw is fixed. Otherwise, logw is assigned with a normal prior and it will be updated during sampling.

iters_h

A positive integer specifying the number of warmup (aka burnin).

iters_rmc

A positive integer specifying the number of iterations after warmup.

thin

A positive integer specifying the period for saving samples.

leap_L

The length of leapfrog trajectory in sampling phase.

leap_L_h

The length of leapfrog trajectory in burnin phase.

leap_step

The stepsize adjustment multiplied to the second-order partial derivatives of log posterior.

hmc_sgmcut

The coefficients smaller than this criteria will be fixed in each HMC updating step.

initial_state

The initial state of Markov Chain; can be a previously fitted fithtlr object, or a user supplied initial state vector, or a character string matches the following:

"lasso" - (Default) Use Lasso initial state with lambda chosen by cross-validation. Users may specify their own candidate lambda values via optional argument lasso.lambda. Further customized Lasso initial states can be generated by lasso_deltas.
"bcbcsfrda" - Use initial state generated by package BCBCSF (Bias-corrected Bayesian classification). Further customized BCBCSF initial states can be generated by bcbcsf_deltas. WARNING: This type of initial states can be used for continuous features such as gene expression profiles, but it should not be used for categorical features such as SNP profiles.
"random" - Use random initial values sampled from N(0, 1).

keep.warmup.hist

Warmup iterations are not recorded by default, set TRUE to enable it.

silence

Setting it to FALSE for tracking MCMC sampling iterations.

rep.legacy

Logical; if TRUE, the output produced in HTLR versions up to legacy-3.1-1 is reproduced. The speed would be typically slower than non-legacy mode on multi-core machine.

alpha.rda

A user supplied alpha value for bcbcsf_deltas when setting up BCBCSF initial state. Default: 0.2.

lasso.lambda

- A user supplied lambda sequence for lasso_deltas when setting up Lasso initial state. Default: {.01, .02, ..., .05}. Will be ignored if rep.legacy is set to TRUE.

X_ts

Test data which predictions are to be made.

predburn, predthin

For prediction base on X_ts (when supplied), predburn of Markov chain (super)iterations will be discarded, and only every predthin are used for inference.

References

Longhai Li and Weixin Yao (2018). Fully Bayesian Logistic Regression with Hyper-Lasso Priors for High-dimensional Feature Selection. Journal of Statistical Computation and Simulation 2018, 88:14, 2827-2851.