Posterior inference for STAR linear model
blm_star(
y,
X,
X_test = NULL,
transformation = "np",
y_max = Inf,
prior = "gprior",
use_MCMC = TRUE,
nsave = 5000,
nburn = 5000,
nskip = 0,
method_sigma = "mle",
approx_Fz = FALSE,
approx_Fy = FALSE,
psi = NULL,
compute_marg = FALSE
)
a list with at least the following elements:
coefficients
: the posterior mean of the regression coefficients
post.beta
: posterior draws of the regression coefficients
post.pred
: draws from the posterior predictive distribution of y
post.log.like.point
: draws of the log-likelihood for each of the n
observations
WAIC
: Widely-Applicable/Watanabe-Akaike Information Criterion
p_waic
: Effective number of parameters based on WAIC
If test points are passed in, then the list will also have post.predtest
,
which contains draws from the posterior predictive distribution at test points.
Other elements may be present depending on the choice of prior, transformation, and sampling approach.
n x 1
vector of observed counts
n x p
matrix of predictors
n0 x p
matrix of predictors for test data
transformation to use for the latent process; must be one of
"identity" (identity transformation)
"log" (log transformation)
"sqrt" (square root transformation)
"np" (nonparametric transformation estimated from empirical CDF)
"pois" (transformation for moment-matched marginal Poisson CDF)
"neg-bin" (transformation for moment-matched marginal Negative Binomial CDF)
"box-cox" (box-cox transformation with learned parameter)
"ispline" (transformation is modeled as unknown, monotone function using I-splines)
"bnp" (Bayesian nonparametric transformation using the Bayesian bootstrap)
a fixed and known upper bound for all observations; default is Inf
prior to use for the latent linear regression; currently implemented options are "gprior", "horseshoe", and "ridge". Not all modeling options and transformations are available with the latter two priors.
= TRUE,
number of MCMC iterations to save (or MC samples to draw if use_MCMC=FALSE)
number of MCMC iterations to discard
number of MCMC iterations to skip between saving iterations, i.e., save every (nskip + 1)th draw
method to estimate the latent data standard deviation in exact sampler; must be one of
"mle" use the MLE from the STAR EM algorithm
"mmle" use the marginal MLE (Note: slower!)
logical; in BNP transformation, apply a (fast and stable) normal approximation for the marginal CDF of the latent data
logical; in BNP transformation, approximate
the marginal CDF of y
using the empirical CDF
prior variance (g-prior)
logical; if TRUE, compute and return the marginal likelihood (only available when using exact sampler, i.e. use_MCMC=FALSE)
STAR defines a count-valued probability model by (1) specifying a Gaussian model for continuous *latent* data and (2) connecting the latent data to the observed data via a *transformation and rounding* operation. Here, the continuous latent data model is a linear regression.
There are several options for the transformation. First, the transformation
can belong to the *Box-Cox* family, which includes the known transformations
'identity', 'log', and 'sqrt', as well as a version in which the Box-Cox parameter
is inferred within the MCMC sampler ('box-cox'). Second, the transformation
can be estimated (before model fitting) using the empirical distribution of the
data y
. Options in this case include the empirical cumulative
distribution function (CDF), which is fully nonparametric ('np'), or the parametric
alternatives based on Poisson ('pois') or Negative-Binomial ('neg-bin')
distributions. For the parametric distributions, the parameters of the distribution
are estimated using moments (means and variances) of y
. The distribution-based
transformations approximately preserve the mean and variance of the count data y
on the latent data scale, which lends interpretability to the model parameters.
Lastly, the transformation can be modeled using the Bayesian bootstrap ('bnp'),
which is a Bayesian nonparametric model and incorporates the uncertainty
about the transformation into posterior and predictive inference.
The Monte Carlo sampler (use_MCMC=FALSE
) produces direct, discrete, and joint draws
from the posterior distribution and the posterior predictive distribution
of the linear regression model with a g-prior.