The functions described on this page are used to specify the prior-related #' arguments of the various modeling functions in the rstap package (to
view the priors used for an existing model see prior_summary
).
The default priors used in the various rstap modeling functions are
intended to be weakly informative in that they provide moderate
regularlization and help stabilize computation. For many applications the
defaults will perform well, but prudent use of more informative priors is
encouraged. All of the priors here are informed by the priors in rstanarm, though it should be noted that the heirarchical shape priors are not included.
normal(location = 0, scale = NULL, autoscale = TRUE)student_t(df = 1, location = 0, scale = NULL, autoscale = TRUE)
cauchy(location = 0, scale = NULL, autoscale = TRUE)
laplace(location = 0, scale = NULL, autoscale = TRUE)
lasso(df = 1, location = 0, scale = NULL, autoscale = TRUE)
product_normal(df = 2, location = 0, scale = 1)
exponential(rate = 1, autoscale = TRUE)
log_normal(location = 0, scale = 1)
decov(regularization = 1, concentration = 1, shape = 1, scale = 1)
lkj(regularization = 1, scale = 10, df = 1, autoscale = TRUE)
Prior location. In most cases, this is the prior mean, but
for cauchy
(which is equivalent to student_t
with
df=1
), the mean does not exist and location
is the prior
median. The default value is \(0\).
Prior scale. The default depends on the family (see Details).
A logical scalar, defaulting to TRUE
. If TRUE
then the scales of the priors on the intercept and regression coefficients
may be additionally modified internally by rstanarm in the following
cases. First, for Gaussian models only, the prior scales for the intercept,
coefficients, and the auxiliary parameter sigma
(error standard
deviation) are multiplied by sd(y)
. Additionally --- not only for
Gaussian models --- if the QR
argument to the model fitting function
(e.g. stap_glm
) is FALSE
then: for a predictor with only one
value nothing is changed; for a predictor x
with exactly two unique
values, we take the user-specified (or default) scale(s) for the selected
priors and divide by the range of x
; for a predictor x
with
more than two unique values, we divide the prior scale(s) by sd(x)
.
Prior degrees of freedom. The default is \(1\) for
student_t
, in which case it is equivalent to cauchy
.
For the product_normal
prior, the degrees of freedom
parameter must be an integer (vector) that is at least \(2\) (the default).
Prior rate for the exponential distribution. Defaults to
1
. For the exponential distribution the rate parameter is the
reciprocal of the mean.
Exponent for an LKJ prior on the correlation matrix in
the decov
or lkj
prior. The default is \(1\), implying a
joint uniform prior.
Concentration parameter for a symmetric Dirichlet distribution. The default is \(1\), implying a joint uniform prior.
Shape parameter for a gamma prior on the scale parameter in the
decov
prior. If shape
and scale
are both \(1\) (the
default) then the gamma prior simplifies to the unit-exponential
distribution.
A named list to be used internally by the rstap model fitting functions.
The details depend on the family of the prior being used:
Family members:
normal(location, scale)
student_t(df, location, scale)
cauchy(location, scale)
autoscale
which is relevant
if used for any of the non-stap related parameters. It is not used otherwise.For the prior distribution for the intercept, location
,
scale
, and df
should be scalars. As the
degrees of freedom approaches infinity, the Student t distribution
approaches the normal distribution and if the degrees of freedom are one,
then the Student t distribution is the Cauchy distribution.
If scale
is not specified it will default to \(10\) for the
intercept and \(2.5\) for the other coefficients, unless the probit link
function is used, in which case these defaults are scaled by a factor of
dnorm(0)/dlogis(0)
, which is roughly \(1.6\).
If the autoscale
argument is TRUE
(the default), then the
scales will be further adjusted as described above in the documentation of
the autoscale
argument in the Arguments section.
Family members:
laplace(location, scale)
lasso(df, location, scale)
autoscale
.The Laplace distribution is also known as the double-exponential distribution. It is a symmetric distribution with a sharp peak at its mean / median / mode and fairly long tails. This distribution can be motivated as a scale mixture of normal distributions and the remarks above about the normal distribution apply here as well.
The lasso approach to supervised learning can be expressed as finding the
posterior mode when the likelihood is Gaussian and the priors on the
coefficients have independent Laplace distributions. It is commonplace in
supervised learning to choose the tuning parameter by cross-validation,
whereas a more Bayesian approach would be to place a prior on “it”,
or rather its reciprocal in our case (i.e. smaller values correspond
to more shrinkage toward the prior location vector). We use a chi-square
prior with degrees of freedom equal to that specified in the call to
lasso
or, by default, 1. The expectation of a chi-square random
variable is equal to this degrees of freedom and the mode is equal to the
degrees of freedom minus 2, if this difference is positive.
It is also common in supervised learning to standardize the predictors
before training the model. We do not recommend doing so. Instead, it is
better to specify autoscale = TRUE
(the default value), which
will adjust the scales of the priors according to the dispersion in the
variables. See the documentation of the autoscale
argument above
and also the prior_summary
page for more information.
Family members:
product_normal(df, location, scale)
location
parameter. It can be shown that the density of a product-normal variate is
symmetric and infinite at location
, so this prior resembles a
“spike-and-slab” prior for sufficiently large values of the
scale
parameter. For better or for worse, this prior may be
appropriate when it is strongly believed (by someone) that a regression
coefficient “is” equal to the location
, parameter even though
no true Bayesian would specify such a prior.Each element of df
must be an integer of at least \(2\) because
these “degrees of freedom” are interpreted as the number of normal
variates being multiplied and then shifted by location
to yield the
regression coefficient. Higher degrees of freedom produce a sharper
spike at location
.
Each element of scale
must be a non-negative real number that is
interpreted as the standard deviation of the normal variates being
multiplied and then shifted by location
to yield the regression
coefficient. In other words, the elements of scale
may differ, but
the k-th standard deviation is presumed to hold for all the normal deviates
that are multiplied together and shifted by the k-th element of
location
to yield the k-th regression coefficient. The elements of
scale
are not the prior standard deviations of the regression
coefficients. The prior variance of the regression coefficients is equal to
the scale raised to the power of \(2\) times the corresponding element of
df
. Thus, larger values of scale
put more prior volume on
values of the regression coefficient that are far from zero.
Gelman, A., Jakulin, A., Pittau, M. G., and Su, Y. (2008). A weakly informative default prior distribution for logistic and other regression models. Annals of Applied Statistics. 2(4), 1360--1383.
# Can assign priors to names N05 <- normal(0, 5)
The various vignettes for the rstanarm and rstap packages also discuss and demonstrate the use of some of the supported prior distributions.
#'