stan_polr: Bayesian ordinal regression models via Stan

Description

Bayesian inference for ordinal (or binary) regression models under a proportional odds assumption.

Usage

stan_polr(formula, data, weights, ..., subset,
  na.action = getOption("na.action", "na.omit"), contrasts = NULL,
  model = TRUE, method = c("logistic", "probit", "loglog", "cloglog",
  "cauchit"), prior = R2(stop("'location' must be specified")),
  prior_counts = dirichlet(1), shape = NULL, rate = NULL,
  prior_PD = FALSE, algorithm = c("sampling", "meanfield", "fullrank"),
  adapt_delta = NULL)
stan_polr.fit(x, y, wt = NULL, offset = NULL, method = c("logistic",
  "probit", "loglog", "cloglog", "cauchit"), ...,
  prior = R2(stop("'location' must be specified")),
  prior_counts = dirichlet(1), shape = NULL, rate = NULL,
  prior_PD = FALSE, algorithm = c("sampling", "meanfield", "fullrank"),
  adapt_delta = NULL)

Arguments

formula, data, subset

Same as polr.

weights, na.action, contrasts, model

Same as polr, but rarely specified.

...

Further arguments passed to the function in the rstan package (sampling, vb, or optimizing

method

One of 'logistic', 'probit', 'loglog', 'cloglog' or 'cauchit', but can be abbreviated. See polr for more details.

prior

Prior for coefficients. Should be a call to R2 to specify the prior location of the $R^2$ but can be NULL to indicate a standard uniform prior. See priors

prior_counts

A call to dirichlet to specify the 
prior counts of the outcome when the predictors are at their sample
means.

shape

Either NULL or a positive scalar that is interpreted
as the shape parameter for a GammaDistribution on
the exponent applied to the probability of success when there are only
two outcome

rate

Either NULL or a positive scalar that is interpreted
as the rate parameter for a GammaDistribution on
the exponent applied to the probability of success when there are only
two outcome c

prior_PD

A logical scalar (defaulting to FALSE) indicating
whether to draw from the prior predictive distribution instead of
conditioning on the outcome.

algorithm

Character string (possibly abbreviated) indicating the 
estimation approach to use. Can be "sampling" for MCMC (the
default), "optimizing" for optimization, "meanfield" for
variational inference with independent norm

adapt_delta

Only relevant if algorithm="sampling". See 
adapt_delta for details.

x

A design matrix.

y

A response variable, which must be a (preferably ordered) factor.

wt

A numeric vector (possibly NULL) of observation weights.

offset

A numeric vector (possibly NULL) of offsets.

`Value`

A stanreg object is returned 
for stan_polr.
A stanfit object (or a slightly modified 
  stanfit object) is returned if stan_polr.fit is called directly.

`Details`

The stan_polr function is similar in syntax to 
  polr but rather than performing maximum likelihood 
  estimation of a proportional odds model, Bayesian estimation is performed
  (if algorithm = "sampling") via MCMC. The stan_polr 
  function calls the workhorse stan_polr.fit function, but it is 
  possible to call the latter directly.
  
  As for stan_lm, it is necessary to specify the prior 
  location of $R^2$. In this case, the $R^2$ pertains to the
  proportion of variance in the latent variable (which is discretized
  by the cutpoints) attributable to the predictors in the model. 
  
  Prior beliefs about the cutpoints are governed by prior beliefs about the
  outcome when the predictors are at their sample means. Both of these
  are explained in the help page on priors and in the 
  rstanarm vignettes.
 
  Unlike polr, stan_polr also allows the "ordinal"
  outcome to contain only two levels, in which case the likelihood is the
  same by default as for stan_glm with family = binomial
  but the prior on the coefficients is different. However, stan_polr
  allows the user to specify the shape and rate hyperparameters,
  in which case the probability of success is defined as the logistic CDF of
  the linear predictor, raised to the power of alpha where alpha
  has a gamma prior with the specified shape and rate. This
  likelihood is called scobit by Nagler (1994) because if alpha
  is not equal to $1$, then the relationship between the linear predictor
  and the probability of success is skewed. If shape or rate is
  NULL, then alpha is assumed to be fixed to $1$. 
  
  Otherwise, it is usually advisible to set shape and rate to
  the same number so that the expected value of alpha is $1$ while
  leaving open the possibility that alpha may depart from $1$ a
  little bit. It is often necessary to have a lot of data in order to estimate
  alpha with much precision and always necessary to inspect the
  Pareto shape parameters calculated by loo to see if the 
  results are particularly sensitive to individual observations.
  
  Users should think carefully about how the outcome is coded when using 
  a scobit-type model. When alpha is not $1$, the asymmetry
  implies that the probability of success is most sensitive to the predictors
  when the probability of success is less than $0.63$. Reversing the
  coding of the successes and failures allows the predictors to have the
  greatest impact when the probability of failure is less than $0.63$.
  Also, the gamma prior on alpha is positively skewed, but you
  can reverse the coding of the successes and failures to circumvent this
  property.

`References`

Nagler, J., (1994). Scobit: An Alternative Estimator to Logit and Probit.
American Journal of Political Science. 230 -- 255.

`See Also`

stanreg-methods and 
polr.
The vignette for stan_polr.

`Examples`

Run this codeif (!grepl("^sparc",  R.version$platform))
stan_polr(tobgp ~ agegp, data = esoph, method = "probit",
          prior = R2(0.2, "mean"), init_r = 0.1, seed = 12345,
          algorithm = "fullrank") # for speed only
Run the code above in your browser using DataLab