set_prior: Prior Definitions for brms Models

Description

Define priors for specific parameters or classes of parameters

Usage

set_prior(prior, class = "b", coef = "", group = "", nlpar = "", resp = NULL, lb = NULL, ub = NULL)

Arguments

prior

A character string defining a distribution in Stan language

class

The parameter class. Defaults to "b" (fixed effects). See 'Details' for other valid parameter classes.

coef

Name of the (population- or group-level) parameter

group

Grouping factor of group-level parameters.

nlpar

Name of a non-linear / auxiliary parameter. Only used in non-linear / distributional models.

resp

Name of the response variable / category. Only used in multivariate and categorical models. Is internally handled as an alias of nlpar.

Lower bound for parameter restriction. Currently only allowed for classes "b", "ar", "ma", and "arr". Defaults to NULL, that is no restriction.

Upper bound for parameter restriction. Currently only allowed for classes "b", "ar", "ma", and "arr". Defaults to NULL, that is no restriction.

Value

An object of class brmsprior to be used in the prior argument of brm.

Details

set_prior is used to define prior distributions for parameters in brms models. Below, we explain its usage and list some common prior distributions for parameters. A complete overview on possible prior distributions is given in the Stan Reference Manual available at http://mc-stan.org/. To combine multiple priors, use c(...), e.g., c(set_prior(...), set_prior(...)). brms does not check if the priors are written in correct Stan language. Instead, Stan will check their syntactical correctness when the model is parsed to C++ and returns an error if they are not. This, however, does not imply that priors are always meaningful if they are accepted by Stan. Although brms trys to find common problems (e.g., setting bounded priors on unbounded parameters), there is no guarantee that the defined priors are reasonable for the model. Currently, there are seven types of parameters in brms models, for which the user can specify prior distributions. 1. Population-level ('fixed') effects Every Population-level effect has its own regression parameter represents the name of the corresponding population-level effect. Suppose, for instance, that y is predicted by x1 and x2 (i.e. y ~ x1+x2 in formula syntax). Then, x1 and x2 have regression parameters b_x1 and b_x2 respectively. The default prior for population-level effects (including monotonic and category specific effects) is an improper flat prior over the reals. Other common options are normal priors or student-t priors. If we want to have a normal prior with mean 0 and standard deviation 5 for x1, and a unit student-t prior with 10 degrees of freedom for x2, we can specify this via set_prior("normal(0,5)", class = "b", coef = "x1") and set_prior("student_t(10,0,1)", class = "b", coef = "x2"). To put the same prior on all fixed effects at once, we may write as a shortcut set_prior("", class = "b"). This also leads to faster sampling, because priors can be vectorized in this case. Both ways of defining priors can be combined using for instance set_prior("normal(0,2)", class = "b") and set_prior("normal(0,10)", class = "b", coef = "x1") at the same time. This will set a normal(0,10) prior on the fixed effect of x1 and a normal(0,2) prior on all other fixed effects. However, this will break vectorization and may slow down the sampling procedure a bit. In case of the default intercept parameterization (discussed in the 'Details' section of brm), the fixed effects intercept has its own parameter class named "Intercept" and priors can thus be specified via set_prior("", class = "Intercept"). Setting a prior on the intercept will not break vectorization of the other population-level effects. A special shrinkage prior to be applied on population-level effects is the horseshoe prior. It is symmetric around zero with fat tails and an infinitely large spike at zero. This makes it ideal for sparse models that have many regression coefficients,although only a minority of them is non-zero. For more details see Carvalho et al. (2009). The horseshoe prior can be applied on all population-level effects at once (excluding the intercept) by using set_prior("horseshoe(1)"). The 1 implies that the student-t prior of the local shrinkage parameters has 1 degrees of freedom. This may, however, lead to an increased number of divergent transition in Stan. Accordingly, increasing the degrees of freedom to slightly higher values (e.g., 3) may often be a better option, although the prior no longer resembles a horseshoe in this case. Generally, models with horseshoe priors a more likely than other models to have divergent transitions so that increasing adapt_delta from 0.8 to values closer to 1 will often be necessary. See the documentation of brm for instructions on how to increase adapt_delta. In non-linear models, population-level effects are defined separately for each non-linear parameter. Accordingly, it is necessary to specify the non-linear parameter in set_prior so that priors we can be assigned correctly. If, for instance, alpha is the parameter and x the predictor for which we want to define the prior, we can write set_prior("", coef = "x", nlpar = "alpha"). As a shortcut we can use set_prior("", nlpar = "alpha") to set the same prior on all population-level effects of alpha at once. If desired, population-level effects can be restricted to fall only within a certain interval using the lb and ub arguments of set_prior. This is often required when defining priors that are not defined everywhere on the real line, such as uniform or gamma priors. When defining a uniform(2,4) prior, you should write set_prior("uniform(2,4)", lb = 2, ub = 4). When using a prior that is defined on the postive reals only (such as a gamma prior) set lb = 0. In most situations, it is not useful to restrict population-level parameters through bounded priors (non-linear models are an important exception), but if you really want to this is the way to go. 2. Standard deviations of group-level ('random') effects Each group-level effect of each grouping factor has a standard deviation named sd__. Consider, for instance, the formula y ~ x1+x2+(1+x1|g). We see that the intercept as well as x1 are group-level effects nested in the grouping factor g. The corresponding standard deviation parameters are named as sd_g_Intercept and sd_g_x1 respectively. These parameters are restriced to be non-negative and, by default, have a half student-t prior with 3 degrees of freedom and a scale parameter that depends on the standard deviation of the response after applying the link function. Minimally, the scale parameter is 10. To define a prior distribution only for standard deviations of a specific grouping factor, use set_prior("", class = "sd", group = ""). To define a prior distribution only for a specific standard deviation of a specific grouping factor, you may write set_prior("", class = "sd", group = "", coef = ""). Recommendations on useful prior distributions for standard deviations are given in Gelman (2006). When defining priors on group-level effects parameters in non-linear models, please make sure to specify the corresponding non-linear parameter through the nlpar argument in the same way as for population-level effects. 3. Correlations of group-level ('random') effects If there is more than one group-level effect per grouping factor, the correlations between those effects have to be estimated. The prior "lkj_corr_cholesky(eta)" or in short "lkj(eta)" with eta > 0 is essentially the only prior for (choelsky factors) of correlation matrices. If eta = 1 (the default) all correlations matrices are equally likely a priori. If eta > 1, extreme correlations become less likely, whereas 0 < eta < 1 results in higher probabilities for extreme correlations. Correlation matrix parameters in brms models are named as cor_(group), (e.g., cor_g if g is the grouping factor). To set the same prior on every correlation matrix, use for instance set_prior("lkj(2)", class = "cor"). 4. Standard deviations of smoothing terms GAMMs are implemented in brms using the 'random effects' formulation of smoothing terms (for details see gamm). Thus, each smoothing term has its corresponding standard deviation modeling the variability within this term. In brms, this parameter class is called sds and priors can be specified via

set_prior("", class = "sds", 
  coef = "")

. The default prior is the same as for standard deviations of group-level effects. 5. Autocorrelation parameters The autocorrelation parameters currently implemented are named ar (autoregression), ma (moving average), and arr (autoregression of the response). Priors can be defined by set_prior("", class = "ar") for ar and similar for ma and arr effects. By default, ar and ma are bounded between -1 and 1 and arr is unbounded (you may change this by using the arguments lb and ub). The default prior is flat over the definition area. 6. Distance parameters of monotonic effects As explained in the details section of brm, monotonic effects make use of a special parameter vector to estimate the 'normalized distances' between consecutive predictor categories. This is realized in Stan using the simplex parameter type and thus this class is also named "simplex" in brms. The only valid prior for simplex parameters is the dirichlet prior, which accepts a vector of length K - 1 (K = number of predictor categories) as input defining the 'concentration' of the distribution. Explaining the dirichlet prior is beyond the scope of this documentation, but we want to describe how to define this prior syntactically correct. If a predictor x with K categories is modeled as monotonic, we can define a prior on its corresponding simplex via set_prior("dirichlet()", class = "simplex", coef = "x"). For , we can put in any R expression defining a vector of length K - 1. The default is a uniform prior (i.e. = rep(1, K-1)) over all simplexes of the respective dimension. 7. Parameters for specific families Some families need additional parameters to be estimated. Families gaussian, student, and cauchy need the parameter sigma to account for the residual standard deviation. By default, sigma has a half student-t prior that scales in the same way as the random effects standard deviations. Furthermore, family student needs the parameter nu representing the degrees of freedom of students t distribution. By default, nu has prior "gamma(2,0.1)" and a fixed lower bound of 1. Families gamma, weibull, inverse.gaussian, and negbinomial need a shape parameter that has a "gamma(0.01,0.01)" prior by default. For families cumulative, cratio, sratio, and acat, and only if threshold = "equidistant", the parameter delta is used to model the distance between two adjacent thresholds. By default, delta has an improper flat prior over the reals. The von_mises family needs the parameter kappa, representing the concentration parameter. By default, kappa has prior "gamma(2, 0.01)". Every family specific parameter has its own prior class, so that set_prior("", class = "") is the right way to go.

Often, it may not be immediately clear, which parameters are present in the model. To get a full list of parameters and parameter classes for which priors can be specified (depending on the model) use function get_prior.

References

Gelman A (2006). Prior distributions for variance parameters in hierarchical models. Bayesian analysis, 1(3), 515 -- 534. Carvalho, C. M., Polson, N. G., & Scott, J. G. (2009). Handling sparsity via the horseshoe. In International Conference on Artificial Intelligence and Statistics (pp. 73-80).

Examples

Run this code

## check which parameters can have priors
get_prior(rating ~ treat + period + carry + (1|subject),
          data = inhaler, family = sratio(), 
          threshold = "equidistant")
         
## define some priors          
prior <- c(set_prior("normal(0,10)", class = "b"),
           set_prior("normal(1,2)", class = "b", coef = "treat"),
           set_prior("cauchy(0,2)", class = "sd", 
                     group = "subject", coef = "Intercept"),
           set_prior("uniform(-5,5)", class = "delta"))
              
## verify that the priors indeed found their way into Stan's model code
make_stancode(rating ~ period + carry + cse(treat) + (1|subject),
              data = inhaler, family = sratio(), 
              threshold = "equidistant",
              prior = prior)
              
## use horseshoe priors to model sparsity in population-level effects parameters
make_stancode(count ~ log_Age_c + log_Base4_c * Trt_c,
              data = epilepsy, family = poisson(),
              prior = set_prior("horseshoe(3)"))

Run the code above in your browser using DataLab