priors: Prior distributions and options

Description

The functions described on this page are used to specify the prior-related #' arguments of the various modeling functions in the rstap package (to view the priors used for an existing model see prior_summary). The default priors used in the various rstap modeling functions are intended to be weakly informative in that they provide moderate regularlization and help stabilize computation. For many applications the defaults will perform well, but prudent use of more informative priors is encouraged. All of the priors here are informed by the priors in rstanarm, though it should be noted that the heirarchical shape priors are not included.

Usage

normal(location = 0, scale = NULL, autoscale = TRUE)
student_t(df = 1, location = 0, scale = NULL, autoscale = TRUE)
cauchy(location = 0, scale = NULL, autoscale = TRUE)
laplace(location = 0, scale = NULL, autoscale = TRUE)
lasso(df = 1, location = 0, scale = NULL, autoscale = TRUE)
product_normal(df = 2, location = 0, scale = 1)
exponential(rate = 1, autoscale = TRUE)
log_normal(location = 0, scale = 1)
decov(regularization = 1, concentration = 1, shape = 1, scale = 1)
lkj(regularization = 1, scale = 10, df = 1, autoscale = TRUE)

Arguments

location

Prior location. In most cases, this is the prior mean, but for cauchy (which is equivalent to student_t with df=1), the mean does not exist and location is the prior median. The default value is \(0\).

scale

Prior scale. The default depends on the family (see Details).

autoscale

A logical scalar, defaulting to TRUE. If TRUE then the scales of the priors on the intercept and regression coefficients may be additionally modified internally by rstanarm in the following cases. First, for Gaussian models only, the prior scales for the intercept, coefficients, and the auxiliary parameter sigma (error standard deviation) are multiplied by sd(y). Additionally --- not only for Gaussian models --- if the QR argument to the model fitting function (e.g. stap_glm) is FALSE then: for a predictor with only one value nothing is changed; for a predictor x with exactly two unique values, we take the user-specified (or default) scale(s) for the selected priors and divide by the range of x; for a predictor x with more than two unique values, we divide the prior scale(s) by sd(x).

Prior degrees of freedom. The default is \(1\) for student_t, in which case it is equivalent to cauchy. For the product_normal prior, the degrees of freedom parameter must be an integer (vector) that is at least \(2\) (the default).

rate

Prior rate for the exponential distribution. Defaults to 1. For the exponential distribution the rate parameter is the reciprocal of the mean.

regularization

Exponent for an LKJ prior on the correlation matrix in the decov or lkj prior. The default is \(1\), implying a joint uniform prior.

concentration

Concentration parameter for a symmetric Dirichlet distribution. The default is \(1\), implying a joint uniform prior.

shape

Shape parameter for a gamma prior on the scale parameter in the decov prior. If shape and scale are both \(1\) (the default) then the gamma prior simplifies to the unit-exponential distribution.

Value

A named list to be used internally by the rstap model fitting functions.

Details

The details depend on the family of the prior being used:

Student t family

Family members:

normal(location, scale)
student_t(df, location, scale)
cauchy(location, scale)

Each of these functions also takes an argument autoscale which is relevant if used for any of the non-stap related parameters. It is not used otherwise.

For the prior distribution for the intercept, location, scale, and df should be scalars. As the degrees of freedom approaches infinity, the Student t distribution approaches the normal distribution and if the degrees of freedom are one, then the Student t distribution is the Cauchy distribution.

If scale is not specified it will default to \(10\) for the intercept and \(2.5\) for the other coefficients, unless the probit link function is used, in which case these defaults are scaled by a factor of dnorm(0)/dlogis(0), which is roughly \(1.6\).

If the autoscale argument is TRUE (the default), then the scales will be further adjusted as described above in the documentation of the autoscale argument in the Arguments section.

Laplace family

Family members:

laplace(location, scale)
lasso(df, location, scale)

Each of these functions also takes an argument autoscale.

The Laplace distribution is also known as the double-exponential distribution. It is a symmetric distribution with a sharp peak at its mean / median / mode and fairly long tails. This distribution can be motivated as a scale mixture of normal distributions and the remarks above about the normal distribution apply here as well.

The lasso approach to supervised learning can be expressed as finding the posterior mode when the likelihood is Gaussian and the priors on the coefficients have independent Laplace distributions. It is commonplace in supervised learning to choose the tuning parameter by cross-validation, whereas a more Bayesian approach would be to place a prior on “it”, or rather its reciprocal in our case (i.e. smaller values correspond to more shrinkage toward the prior location vector). We use a chi-square prior with degrees of freedom equal to that specified in the call to lasso or, by default, 1. The expectation of a chi-square random variable is equal to this degrees of freedom and the mode is equal to the degrees of freedom minus 2, if this difference is positive.

It is also common in supervised learning to standardize the predictors before training the model. We do not recommend doing so. Instead, it is better to specify autoscale = TRUE (the default value), which will adjust the scales of the priors according to the dispersion in the variables. See the documentation of the autoscale argument above and also the prior_summary page for more information.

Product-normal family

Family members:

product_normal(df, location, scale)

The product-normal distribution is the product of at least two independent normal variates each with mean zero, shifted by the location parameter. It can be shown that the density of a product-normal variate is symmetric and infinite at location, so this prior resembles a “spike-and-slab” prior for sufficiently large values of the scale parameter. For better or for worse, this prior may be appropriate when it is strongly believed (by someone) that a regression coefficient “is” equal to the location, parameter even though no true Bayesian would specify such a prior.

Each element of df must be an integer of at least \(2\) because these “degrees of freedom” are interpreted as the number of normal variates being multiplied and then shifted by location to yield the regression coefficient. Higher degrees of freedom produce a sharper spike at location.

Each element of scale must be a non-negative real number that is interpreted as the standard deviation of the normal variates being multiplied and then shifted by location to yield the regression coefficient. In other words, the elements of scale may differ, but the k-th standard deviation is presumed to hold for all the normal deviates that are multiplied together and shifted by the k-th element of location to yield the k-th regression coefficient. The elements of scale are not the prior standard deviations of the regression coefficients. The prior variance of the regression coefficients is equal to the scale raised to the power of \(2\) times the corresponding element of df. Thus, larger values of scale put more prior volume on values of the regression coefficient that are far from zero.

References

Gelman, A., Jakulin, A., Pittau, M. G., and Su, Y. (2008). A weakly informative default prior distribution for logistic and other regression models. Annals of Applied Statistics. 2(4), 1360--1383.

# Can assign priors to names N05 <- normal(0, 5)

Description

Usage

Arguments

Value

Details

Student t family

Laplace family

Product-normal family

References

See Also