Compute MCMC samples from the posterior and predictive distributions of a STAR linear regression model with a g-prior and BNP transformation.
blm_star_bnpgibbs(
y,
X,
X_test = X,
y_max = Inf,
psi = NULL,
approx_Fz = FALSE,
approx_Fy = FALSE,
nsave = 1000,
nburn = 1000,
nskip = 0,
verbose = TRUE
)
a list with the following elements:
coefficients
the posterior mean of the regression coefficients
post_beta
: nsave x p
samples from the posterior distribution
of the regression coefficients
post_ytilde
: nsave x n0
samples
from the posterior predictive distribution at test points X_test
post_g
: nsave
posterior samples of the transformation
evaluated at the unique y
values (only applies for 'bnp' transformations)
n x 1
vector of observed counts
n x p
matrix of predictors
n0 x p
matrix of predictors for test data;
default is the observed covariates X
a fixed and known upper bound for all observations; default is Inf
prior variance (g-prior)
logical; in BNP transformation, apply a (fast and stable) normal approximation for the marginal CDF of the latent data
logical; in BNP transformation, approximate
the marginal CDF of y
using the empirical CDF
number of MCMC iterations to save
number of MCMC iterations to discard
number of MCMC iterations to skip between saving iterations, i.e., save every (nskip + 1)th draw
STAR defines a count-valued probability model by (1) specifying a Gaussian model for continuous *latent* data and (2) connecting the latent data to the observed data via a *transformation and rounding* operation. Here, the continuous latent data model is a linear regression.
There are several options for the transformation. First, the transformation
can belong to the *Box-Cox* family, which includes the known transformations
'identity', 'log', and 'sqrt'. Second, the transformation
can be estimated (before model fitting) using the empirical distribution of the
data y
. Options in this case include the empirical cumulative
distribution function (CDF), which is fully nonparametric ('np'), or the parametric
alternatives based on Poisson ('pois') or Negative-Binomial ('neg-bin')
distributions. For the parametric distributions, the parameters of the distribution
are estimated using moments (means and variances) of y
. The distribution-based
transformations approximately preserve the mean and variance of the count data y
on the latent data scale, which lends interpretability to the model parameters.
Lastly, the transformation can be modeled using the Bayesian bootstrap ('bnp'),
which is a Bayesian nonparametric model and incorporates the uncertainty
about the transformation into posterior and predictive inference.