bhs(X, y, T=1000, thin=NULL, RJ=TRUE, M=NULL, beta=NULL,
lambda2=1, s2=var(y-mean(y)), mprior=0, ab=NULL,
theta=0, rao.s2=TRUE, icept=TRUE, normalize=TRUE, verb=1)
bridge(X, y, T = 1000, thin = NULL, RJ = TRUE, M = NULL,
beta = NULL, lambda2 = 1, s2 = var(y-mean(y)), mprior = 0,
rd = NULL, ab = NULL, theta=0, rao.s2 = TRUE, icept = TRUE,
normalize = TRUE, verb = 1)
blasso(X, y, T = 1000, thin = NULL, RJ = TRUE, M = NULL,
beta = NULL, lambda2 = 1, s2 = var(y-mean(y)),
case = c("default", "ridge", "hs", "ng"), mprior = 0, rd = NULL,
ab = NULL, theta = 0, rao.s2 = TRUE, icept = TRUE,
normalize = TRUE, verb = 1)
data.frame
, matrix
, or vector of inputs X
y
of length equal to the
leading dimension (rows) of X
, i.e., length(y) == nrow(X)
NULL
(default), then
thin
is determined based on the regression model implied
by RJ
, lambda2
, and ncol(X)
TRUE
then model selection on the columns of the
design matrix (and thus the parameter beta
in the model) is
performed by Reversible Jump (RJ) MCMC. The initial model is
specified by the beta
input, deX
) in the model. If input lambda2 > 0
then any
M <= ncol(x)<="" code=""> is allowed. Otherwise it must be that
M <= min(ncol(x),="" length(y)-1)<="" code="">, which =>
=>
X
) is not in the initial model. When input RJ =
FALSE
(no RJ) and lambda2 > 0
lambda2 > 0
m
in the model. The default
(mprior = 0
) encodes the uniform prior on 0 <= m="" <="M
.
A scalar value 0 < mprior < 1
=>=c(r, delta)
, the alpha (shape) parameter and
$\beta$ (rate) parameter to the gamma distribution prior
G(r,delta)
for the $\lambda^2$ parameter under
the lasso model; or, the $\alpha$ (shape) parameter and
$\beta$=c(a, b)
, the $\alpha$ (shape)
parameter and the $\beta$ (scale) parameter for the
inverse-gamma distribution prior IG(a,b)
for the variance
parameter s2
. A default of NULL
generates appropriat> 0
) to the exponential prior
on the degrees of freedom paramter nu
under a model with
Student-t errors implemented by a scale-mixture prior.
The default setting of theta = 0
turns off thTRUE
); see
below for more detailsTRUE
, an implicit intercept term is fit
in the model, otherwise the the intercept is zero; default is
TRUE
TRUE
, each variable is standardized
to have unit L2-norm, otherwise it is left alone; default is
TRUE
verb = 0
and
verb = 1
are supportedblasso
returns an object of class "blasso"
, which is a
list
containing a copy of all of the input arguments as well as
of the components listed below.T
samples of the (un-penalized)
T*ncol(X)
matrix
of T
samples from
the (penalized) regression coefficientsT
samples of beta
T
samples of the variance parameterT
samples of the penalty
parameterT
with the gamma parameter
when case = "ng"
T*ncol(X)
matrix
of T
samples from
the (latent) inverse diagonal of the prior covariance matrix for
beta
, obtained for Lasso regressionsT*nrow(X)
matrix
of T
samples
from the (latent) diagonal of the covariance matrix of the response
providing a scale-mixture implementation of Student-t errors with
degrees of freedom nu
when active (input theta > 0
)T
samples of the degrees of freedom
parameter to the Student-t errors mode when active
(input theta > 0
)T
samples of the Binomial proportion
p
that was given a Beta prior, as described above for the
2-vector version of the mprior
inputtheta > 0
lambda2 = 0
is
supplied, the model is a simple hierarchical linear model where
$(\beta,\sigma^2)$ is given a Jeffrey's prior Specifying RJ = TRUE
causes Bayesian model selection and
averaging to commence for choosing which of the columns of the
design matrix X
(and thus parameters beta
) should be
included in the model. The zero-components of the beta
input
specify which columns are in the initial model, and
M
specifies the maximal number of columns.
The RJ mechanism implemented here for the Bayesian lasso model selection differs from the one described by Hans (2009), which is based on an idea from Geweke (1996). Those methods require departing from the Park & Casella (2008) latent-variable model and requires sampling from each conditional $\beta_i | \beta_{(-i)}, \dots$ for all $i$, since a mixture prior with a point-mass at zero is placed on each $\beta_i$. Out implementation here requires no such special prior and retains the joint sampling from the full $\beta$ vector of non-zero entries, which we believe yields better mixing in the Markov chain. RJ proposals to increase/decrease the number of non-zero entries does proceed component-wise, but the acceptance rates are high due due to marginalized between-model moves (Troughton & Godsill, 1997).
When the lasso prior or RJ is used, the automatic thinning level
(unless thin != NULL
) is determined by the number of columns
of X
since this many latent variables are introduced
Bayesian ridge regression is implemented as a special case via the
bridge
function. This essentially calls blasso
with case = "ridge"
. A default setting of rd = c(0,0)
is
implied by rd = NULL
, giving the Jeffery's prior for the
penalty parameter $\lambda^2$ unless ncol(X) >=
length(y)
in which case the proper specification of rd =
c(5,10)
is used instead.
The Normal--Gamma prior (Griffin & Brown, 2009) is implemented as
an extension to the Bayesian lasso with case = "ng"
. Many
thanks to James Scott for providing the code needed to extend the
method(s) to use the horseshoe prior (Carvalho, Polson, Scott, 2010).
When theta > 0
then the Student-t errors via scale mixtures
(and thereby extra latent variables omega2
) of Geweke (1993)
is applied as an extension to the Bayesian lasso/ridge model.
If Student-t errors are used the automatic thinning level
is augmented (unless thin != NULL
) by the number of rows
in X
since this many latent variables are introduced
Griffin, J.E. and Brown, P.J. (2009).
Inference with Normal-Gamma prior distributions in
regression problems. Tech. rep., University of Kent.
Carvalho, C.M., Polson, N.G., and Scott, J.G. (2010) The
horseshoe estimator for sparse signals. Biometrika 97(2):
pp. 465-480.
Geweke, J. (1996). Variable selection and model comparison in regression. In Bayesian Statistics 5. Editors: J.M. Bernardo, J.O. Berger, A.P. Dawid and A.F.M. Smith, 609-620. Oxford Press.
Paul T. Troughton and Simon J. Godsill (1997).
A reversible jump sampler for autoregressive time series,
employing full conditionals to achieve efficient model space moves.
Technical Report CUED/F-INFENG/TR.304, Cambridge University
Engineering Department.
Geweke, J. (1993) Bayesian treatment of the independent
Student-t linear model. Journal of Applied Econometrics, Vol. 8,
S19-S40
lm
,
lars
in the regress
,
lm.ridge
in the ## following the lars diabetes example
data(diabetes)
attach(diabetes)
## Ordinary Least Squares regression
reg.ols <- regress(x, y)
## Lasso regression
reg.las <- regress(x, y, method="lasso")
## Bayesian Lasso regression
reg.blas <- blasso(x, y)
## summarize the beta (regression coefficients) estimates
plot(reg.blas, burnin=200)
points(drop(reg.las$b), col=2, pch=20)
points(drop(reg.ols$b), col=3, pch=18)
legend("topleft", c("blasso-map", "lasso", "lsr"),
col=c(2,2,3), pch=c(21,20,18))
## plot the size of different models visited
plot(reg.blas, burnin=200, which="m")
## get the summary
s <- summary(reg.blas, burnin=200)
## calculate the probability that each beta coef != zero
s$bn0
## summarize s2
plot(reg.blas, burnin=200, which="s2")
s$s2
## summarize lambda2
plot(reg.blas, burnin=200, which="lambda2")
s$lambda2
## fit with Student-t errors
## (~400-times slower due to automatic thinning level)
regt.blas <- blasso(x, y, theta=0.1)
## plotting some information about nu, and quantiles
plot(regt.blas, "nu", burnin=200)
quantile(regt.blas$nu[-(1:200)], c(0.05, 0.95))
## Bayes Factor shows strong evidence for Student-t model
mean(exp(regt.blas$llik[-(1:200)] - regt.blas$llik.norm[-(1:200)]))
## clean up
detach(diabetes)
Run the code above in your browser using DataLab