qreg.spike: Quantile Regression

Description

MCMC algorithm for quasi-Bayesian quantile models with a 'spike-and-slab' prior that places some amount of posterior probability at zero for a subset of the regression coefficients.

Usage

qreg.spike(formula,
               quantile,
               niter,
               ping = niter / 10,
               nthreads = 0,
               data,
               subset,
               prior = NULL,
               na.action = options("na.action"),
               contrasts = NULL,
               drop.unused.levels = TRUE,
               initial.value = NULL,
               seed = NULL,
               ...)

Value

Returns an object of class qreg.spike, which inherits from

lm.spike. The returned object is a list with the following elements

beta: A niter by ncol(x) matrix of regression coefficients, many of which may be zero. Each row corresponds to an MCMC iteration.
prior: The prior used to fit the model. If a prior was supplied as an argument it will be returned. Otherwise this will be the automatically generated prior based on the other function arguments.

Arguments

formula: Formula for the maximal model (with all variables included).
quantile: A scalar value between 0 and 1 indicating the quantile of the conditional distribution being modeled.
niter: The number of MCMC iterations to run. Be sure to include enough so you can throw away a burn-in set.
ping: If positive, then print a status update to the console every ping MCMC iterations.
nthreads: The number of CPU-threads to use for data augmentation. There is some small overhead to stopping and starting threads. For small data sets, thread overhead will make it faster to run single threaded. For larger data sets multi-threading can speed things up substantially. This is all machine dependent, so please experiment.
data: An optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which qreg.spike is called.
subset: An optional vector specifying a subset of observations to be used in the fitting process.
prior: An optional list such as that returned from SpikeSlabPrior. If missing, SpikeSlabPrior will be called using the extra arguments passed via ....
na.action: A function which indicates what should happen when the data contain NAs. The default is set by the na.action setting of options, and is na.fail if that is unset. The factory-fresh default is na.omit. Another possible value is NULL, no action. Value na.exclude can be useful.
contrasts: An optional list. See the contrasts.arg of model.matrix.default.
drop.unused.levels: A logical value indicating whether factor levels that are unobserved should be dropped from the model.
initial.value: Initial value for the MCMC algorithm. Can either be a numeric vector, a glm object (from which the coefficients will be used), or a qreg.spike object. If a qreg.spike object is supplied, it is assumed to be from a previous MCMC run for which niter additional draws are desired. If a glm object is supplied then its coefficients will be used as the initial values for the simulation.
seed: Seed to use for the C++ random number generator. It should be NULL or an int. If NULL the seed value will be taken from the global .Random.seed object.
...: Extra arguments to be passed to SpikeSlabPrior.

Author

Steven L. Scott

Details

Just like ordinary regression models the mean of a distribution as a linear function of X, quantile regression models a specific quantile (e.g. the 90th percentile) as a function of X.

Median regression is a special case of quantile regression. Median regression is sometimes cast in terms of minimizing |y - X * beta|, because the median is the optimal action under L1 loss. Similarly, selecting quantile tau is optimal under the asymmetric loss function

$$\rho_\tau(u) = \tau u I(u > 0) + (1-\tau) u I(u < 0) $$

Thus quantile regression (for a specific quantile tau) minimizes $$ Q(\beta) = \sum_i \rho_\tau( y_i - \beta^Tx_i) $$

Bayesian quantile regression treats $$\exp(-2Q(\beta))$$ as a likelihood function to which a prior distribution $p(\beta)$ is applied. For posterior sampling, a data augmentation scheme is used where each observation is associated with a latent variable $\lambda_i$, which has a marginal distribution of $$Exp(2 \tau(1-\tau)).$$

The conditional distribution given the residual $r = y - x \beta$ is $$ \frac{1}{\lambda} | r \sim InvGaus( 1 / |r|, 1.0)$$

The conditional distribution of beta given complete data (lambda and y) is a weighted least squares regression, where observation i has precision $\lambda_i$ and where observation i is offset by $2(\tau - 1)\lambda_i$.

References

Parzen and Polson (2011, unpublished)

Examples

Run this code

  n <- 50
  x <- rnorm(n)
  y <- rnorm(n, 4 * x)
  model <- qreg.spike(y ~ x,
                      quantile = .8,
                      niter = 1000,
                      expected.model.size = 100)

  ## Should get a slope near 4 and an intercept near qnorm(.8).
  PlotManyTs(model$beta[-(1:100),],
             same.scale = TRUE,
             truth = c(qnorm(.8), 4))

Run the code above in your browser using DataLab