MCMC algorithm for quasi-Bayesian quantile models with a 'spike-and-slab' prior that places some amount of posterior probability at zero for a subset of the regression coefficients.
qreg.spike(formula,
quantile,
niter,
ping = niter / 10,
nthreads = 0,
data,
subset,
prior = NULL,
na.action = options("na.action"),
contrasts = NULL,
drop.unused.levels = TRUE,
initial.value = NULL,
seed = NULL,
...)
Returns an object of class qreg.spike
, which inherits from
lm.spike
. The returned object is a list with the following
elements
A niter
by ncol(x)
matrix of regression
coefficients, many of which may be zero. Each row corresponds to an
MCMC iteration.
The prior used to fit the model. If a prior
was
supplied as an argument it will be returned. Otherwise this will be
the automatically generated prior based on the other function
arguments.
Formula for the maximal model (with all variables included).
A scalar value between 0 and 1 indicating the quantile of the conditional distribution being modeled.
The number of MCMC iterations to run. Be sure to include enough so you can throw away a burn-in set.
If positive, then print a status update to the console
every ping
MCMC iterations.
The number of CPU-threads to use for data augmentation. There is some small overhead to stopping and starting threads. For small data sets, thread overhead will make it faster to run single threaded. For larger data sets multi-threading can speed things up substantially. This is all machine dependent, so please experiment.
An optional data frame, list or environment (or object
coercible by as.data.frame
to a data frame) containing the
variables in the model. If not found in data
, the variables
are taken from environment(formula)
, typically the
environment from which qreg.spike
is called.
An optional vector specifying a subset of observations to be used in the fitting process.
An optional list such as that returned from
SpikeSlabPrior
. If missing, SpikeSlabPrior
will be called using the extra arguments passed via ....
A function which indicates what should happen when
the data contain NA
s. The default is set by the
na.action
setting of options
, and is na.fail
if
that is unset. The factory-fresh
default is na.omit
.
Another possible value is NULL
, no action. Value
na.exclude
can be useful.
An optional list. See the contrasts.arg
of
model.matrix.default
.
A logical value indicating whether factor levels that are unobserved should be dropped from the model.
Initial value for the MCMC algorithm. Can either
be a numeric vector, a glm
object (from which the
coefficients will be used), or a qreg.spike
object.
If a qreg.spike
object is supplied, it is assumed to
be from a previous MCMC run for which niter
additional draws
are desired. If a glm
object is supplied then its
coefficients will be used as the initial values for the simulation.
Seed to use for the C++ random number generator. It
should be NULL
or an int. If NULL
the seed value will
be taken from the global .Random.seed
object.
Extra arguments to be passed to SpikeSlabPrior
.
Steven L. Scott
Just like ordinary regression models the mean of a distribution as a linear function of X, quantile regression models a specific quantile (e.g. the 90th percentile) as a function of X.
Median regression is a special case of quantile regression. Median regression is sometimes cast in terms of minimizing |y - X * beta|, because the median is the optimal action under L1 loss. Similarly, selecting quantile tau is optimal under the asymmetric loss function
$$\rho_\tau(u) = \tau u I(u > 0) + (1-\tau) u I(u < 0) $$
Thus quantile regression (for a specific quantile tau) minimizes $$ Q(\beta) = \sum_i \rho_\tau( y_i - \beta^Tx_i) $$
Bayesian quantile regression treats $$\exp(-2Q(\beta))$$ as a likelihood function to which a prior distribution \(p(\beta)\) is applied. For posterior sampling, a data augmentation scheme is used where each observation is associated with a latent variable \(\lambda_i\), which has a marginal distribution of $$Exp(2 \tau(1-\tau)).$$
The conditional distribution given the residual \(r = y - x \beta\) is $$ \frac{1}{\lambda} | r \sim InvGaus( 1 / |r|, 1.0)$$
The conditional distribution of beta given complete data (lambda and y) is a weighted least squares regression, where observation i has precision \(\lambda_i\) and where observation i is offset by \(2(\tau - 1)\lambda_i\).
Parzen and Polson (2011, unpublished)
lm.spike
SpikeSlabPrior
,
plot.qreg.spike
,
predict.qreg.spike
.
n <- 50
x <- rnorm(n)
y <- rnorm(n, 4 * x)
model <- qreg.spike(y ~ x,
quantile = .8,
niter = 1000,
expected.model.size = 100)
## Should get a slope near 4 and an intercept near qnorm(.8).
PlotManyTs(model$beta[-(1:100),],
same.scale = TRUE,
truth = c(qnorm(.8), 4))
Run the code above in your browser using DataLab