Compute direct Monte Carlo samples from the posterior and predictive distributions of a STAR linear regression model with a g-prior and Bayesian nonparametric (BNP) transformation.
blm_star_exact_bnp(
y,
X,
X_test = X,
y_max = Inf,
psi = length(y),
alpha = 1,
P0 = NULL,
pilot_run = FALSE,
nsave = 1000,
verbose = TRUE
)a list with the following elements:
coefficients: the posterior mean of the regression coefficients
post.beta: nsave x p samples from the posterior distribution
of the regression coefficients
post.pred: nsave x n_test samples
from the posterior predictive distribution at test points X_test
post.g: nsave posterior samples of the transformation
evaluated at 1:max(y)
post.log.like.point: draws of the log-likelihood for each of the n observations
WAIC: Widely-Applicable/Watanabe-Akaike Information Criterion
p_waic: Effective number of parameters based on WAIC
n x 1 vector of observed counts
n x p matrix of predictors (including an intercept)
n_test x p matrix of predictors for test data
(including an intercept); default is the observed covariates X
a fixed and known upper bound for all observations; default is Inf
prior variance (g-prior); default is n
prior precision for the Dirichlet Process prior; default is one
function to evaluate the base measure PMF supported on {0,...,y_max};
see below for default values when unspecified (NULL)
logical; if TRUE, use a short pilot run to approximate
the marginal CDF of the latent z; otherwise, use a Laplace approximation
number of Monte Carlo iterations to save
This function provides fully Bayesian inference for a
transformed linear model with discrete data. The sampling algorithm draws the
transformation g together with the regression coefficients beta
from their *joint* posterior distribution using Monte Carlo (not MCMC) sampling.
When n is moderate to larger, blm_star_gibbs_bnp is recommended.
The BNP model for the transformation g is derived from a Dirichlet Process (DP)
prior on the marginal CDF of y. The user can specify the prior precision
alpha and the base measure, e.g., P0 = function(t) dpois(t, lambda = 10).
Otherwise, the default approach views the base measure as a means to ensure
correct support and avoid boundary issues, both of which are concerns with the
Bayesian bootstrap (alpha = 0). Thus, the default is a small prior
precision alpha = 1 and base measures that guarantee the right support.
For y_max < Inf, we simply use Uniform{0,...,y_max}. Otherwise,
we use Geom(pi_geom), where pi_geom is elicited by fixing
the probability of exceeding the maximum observed value, P(y > max(y)),
which we set at 0.10. Recall that the DP posterior for the marginal CDF
of y combines the base measure with the empirical measure: in expectation,
the base measure has weight alpha/(n + alpha) = 1/(n+1) while the empirical
part has weight n/(n + alpha) = n/(n+1). Thus, the base measure's contribution is small,
and matters most for data values that may exceed max(y).