STAR defines a count-valued probability model by
(1) specifying a Gaussian model for continuous *latent* data and
(2) connecting the latent data to the observed data via a
*transformation and rounding* operation.
The expectation-maximization (EM) algorithm is used to produce
maximum likelihood estimators (MLEs) for the parameters defined in the
estimator
function, such as linear regression coefficients,
which define the Gaussian model for the continuous latent data.
Fitted values (point predictions), residuals, and log-likelihood values
are also available. Inference for the estimators proceeds via classical maximum likelihood.
Initialization of the EM algorithm can be randomized to monitor convergence.
However, the log-likelihood is concave for all transformations (except 'box-cox'),
so global convergence is guaranteed.
There are several options for the transformation. First, the transformation
can belong to the *Box-Cox* family, which includes the known transformations
'identity', 'log', and 'sqrt', as well as a version in which the Box-Cox parameter
is estimated within the EM algorithm ('box-cox'). Second, the transformation
can be estimated (before model fitting) using the empirical distribution of the
data y
. Options in this case include the empirical cumulative
distribution function (CDF), which is fully nonparametric ('np'), or the parametric
alternatives based on Poisson ('pois') or Negative-Binomial ('neg-bin')
distributions. For the parametric distributions, the parameters of the distribution
are estimated using moments (means and variances) of y
.