Learn R Programming

countSTAR (version 1.2.0)

blm_star_gibbs_bnp: Gibbs sampler for STAR linear regression with BNP transformation

Description

Compute MCMC samples from the posterior and predictive distributions of a STAR linear regression model with a g-prior and Bayesian nonparametric (BNP) transformation.

Usage

blm_star_gibbs_bnp(
  y,
  X,
  X_test = X,
  y_max = Inf,
  psi = length(y),
  alpha = 1,
  P0 = NULL,
  pilot_run = FALSE,
  nsave = 1000,
  nburn = 1000,
  nskip = 0,
  verbose = TRUE
)

Value

a list with the following elements:

  • coefficients: the posterior mean of the regression coefficients

  • post.beta: nsave x p samples from the posterior distribution of the regression coefficients

  • post.pred: nsave x n_test samples from the posterior predictive distribution at test points X_test

  • post.g: nsave posterior samples of the transformation evaluated at 1:max(y)

  • post.log.like.point: draws of the log-likelihood for each of the n observations

  • WAIC: Widely-Applicable/Watanabe-Akaike Information Criterion

  • p_waic: Effective number of parameters based on WAIC

Arguments

y

n x 1 vector of observed counts

X

n x p matrix of predictors (including an intercept)

X_test

n_test x p matrix of predictors for test data (including an intercept); default is the observed covariates X

y_max

a fixed and known upper bound for all observations; default is Inf

psi

prior variance (g-prior); default is n

alpha

prior precision for the Dirichlet Process prior; default is one

P0

function to evaluate the base measure PMF supported on {0,...,y_max}; see below for default values when unspecified (NULL)

pilot_run

logical; if TRUE, use a short pilot run to approximate the marginal CDF of the latent z; otherwise, use a Laplace approximation

nsave

number of MCMC iterations to save

nburn

number of MCMC iterations to discard

nskip

number of MCMC iterations to skip between saving iterations, i.e., save every (nskip + 1)th draw

Details

This function provides fully Bayesian inference for a transformed linear model with discrete data. The sampling algorithm draws the transformation g directly from its *marginal* posterior distribution and then uses a Gibbs sampler for the latent data z and the regression coefficients beta. Compared to the direct Monte Carlo version in blm_star_exact, this MCMC algorithm uses a blocking structure that balances MCMC efficiency (the marginal sampler for g) with computational scalability (the Gibbs sampler for z and beta).

The BNP model for the transformation g is derived from a Dirichlet Process (DP) prior on the marginal CDF of y. The user can specify the prior precision alpha and the base measure, e.g., P0 = function(t) dpois(t, lambda = 10). Otherwise, the default approach views the base measure as a means to ensure correct support and avoid boundary issues, both of which are concerns with the Bayesian bootstrap (alpha = 0). Thus, the default is a small prior precision alpha = 1 and base measures that guarantee the right support. For y_max < Inf, we simply use Uniform{0,...,y_max}. Otherwise, we use Geom(pi_geom), where pi_geom is elicited by fixing the probability of exceeding the maximum observed value, P(y > max(y)), which we set at 0.10. Recall that the DP posterior for the marginal CDF of y combines the base measure with the empirical measure: in expectation, the base measure has weight alpha/(n + alpha) = 1/(n+1) while the empirical part has weight n/(n + alpha) = n/(n+1). Thus, the base measure's contribution is small, and matters most for data values that may exceed max(y).