STAR defines a count-valued probability model by
(1) specifying a Gaussian model for continuous *latent* data and
(2) connecting the latent data to the observed data via a
*transformation and rounding* operation. Here, the continuous
latent data model is a linear regression.
There are several options for the transformation. First, the transformation
can belong to the *Box-Cox* family, which includes the known transformations
'identity', 'log', and 'sqrt', as well as a version in which the Box-Cox parameter
is inferred within the MCMC sampler ('box-cox').
Second, the transformation can be estimated (before model fitting) using the
the data y. Options in this case include the empirical cumulative
distribution function (ECDF), which is fully nonparametric ('np'), or the parametric
alternatives based on Poisson ('pois') or Negative-Binomial ('neg-bin')
distributions. For the parametric distributions, the parameters of the distribution
are estimated using moments (means and variances) of y.
Lastly, the transformation can be modeled nonparametrically using (monotone)
splines ('ispline') or Bayesian nonparametrics via Dirichlet processes ('bnp').
The 'bnp' option is the default because it is highly flexible, accounts for
uncertainty when the transformation is unknown, and is computationally efficient.
The Monte Carlo sampler (use_MCMC=FALSE) produces direct, joint draws
from the posterior predictive distribution under a g-prior. When n is
moderate to large, or to use other priors, MCMC sampling (use_MCMC=TRUE)
is much faster and more convenient.