Inference for ordinary least squares, lasso/NG, horseshoe and ridge regression models by (Gibbs) sampling from the Bayesian posterior distribution, augmented with Reversible Jump for model selection
bhs(X, y, T=1000, thin=NULL, RJ=TRUE, M=NULL, beta=NULL,
         lambda2=1, s2=var(y-mean(y)), mprior=0, ab=NULL,
         theta=0, rao.s2=TRUE, icept=TRUE, normalize=TRUE, verb=1)
bridge(X, y, T = 1000, thin = NULL, RJ = TRUE, M = NULL,
       beta = NULL, lambda2 = 1, s2 = var(y-mean(y)), mprior = 0,
       rd = NULL, ab = NULL, theta=0, rao.s2 = TRUE, icept = TRUE,
       normalize = TRUE, verb = 1)
blasso(X, y, T = 1000, thin = NULL, RJ = TRUE, M = NULL,
       beta = NULL, lambda2 = 1, s2 = var(y-mean(y)),
       case = c("default", "ridge", "hs", "ng"), mprior = 0, rd = NULL,
       ab = NULL, theta = 0, rao.s2 = TRUE, icept = TRUE, 
       normalize = TRUE, verb = 1)blasso returns an object of class "blasso", which is a
list containing a copy of all of the input arguments as well as
  of the components listed below.
a copy of the function call as used
a vector of T samples of the (un-penalized)
    “intercept” parameter
a T*ncol(X) matrix of T samples from
    the (penalized) regression coefficients
the number of non-zero entries in each vector of T
    samples of beta
a vector of T samples of the variance parameter
a vector of T samples of the penalty
    parameter
a vector of T with the gamma parameter
    when case = "ng"
a T*ncol(X) matrix of T samples from
    the (latent) inverse diagonal of the prior covariance matrix for
    beta, obtained for Lasso regressions
a T*nrow(X) matrix of T samples
    from the (latent) diagonal of the covariance matrix of the response
    providing a scale-mixture implementation of Student-t errors with
    degrees of freedom nu when active (input theta > 0)
a vector of T samples of the degrees of freedom
    parameter to the Student-t errors mode when active
    (input theta > 0)
a vector of T samples of the Binomial proportion
    p that was given a Beta prior, as described above for the
    2-vector version of the mprior input
the log posterior probability of each (saved) sample of the joint parameters
the log likelihood of each (saved) sample of the parameters
the log likelihood of each (saved) sample of the
    parameters under the Normal errors model when sampling under the
    Student-t model; i.e., it is not present
    unless theta > 0
data.frame, matrix, or vector of inputs X
vector of output responses y of length equal to the
    leading dimension (rows) of X, i.e., length(y) == nrow(X)
total number of MCMC samples to be collected
number of MCMC samples to skip before a sample is
    collected (via thinning).  If NULL (default), then
    thin is determined based on the regression model implied
    by RJ, lambda2, and ncol(X); and also
    on the errors model implied by theta and nrow(X)
if TRUE then model selection on the columns of the
    design matrix (and thus the parameter beta in the model) is
    performed by Reversible Jump (RJ) MCMC.  The initial model is
    specified by the beta input, described below, and the maximal
    number of covariates in the model is specified by M
the maximal number of allowed covariates (columns of
    X) in the model.  If input lambda2 > 0 then any
    M <= ncol(X) is allowed.  Otherwise it must be that
    M <= min(ncol(X), length(y)-1), which is default value
    when a NULL argument is given
initial setting of the regression coefficients.  Any
    zero-components will imply that the corresponding covariate (column
    of X) is not in the initial model.  When input RJ =
      FALSE (no RJ) and lambda2 > 0 (use lasso) then no
    components are allowed to be exactly zero.  The default setting is
    therefore contextual; see below for details
square of the initial lasso penalty parameter. If zero, then least squares regressions are used
initial variance parameter
specifies if ridge regression, the
    Normal-Gamma, or the horseshoe prior should be done instead
    of the lasso; only meaningful when lambda2 > 0
prior on the number of non-zero regression coefficients
    (and therefore covariates) m in the model. The default
    (mprior = 0) encodes the uniform prior on 0 <= m <= M.
    A scalar value 0 < mprior < 1 implies a Binomial prior
    Bin(m|n=M,p=mprior). A 2-vector mprior=c(g,h)
    of positive values g and h represents
    gives Bin(m|n=M,p) prior where p~Beta(g,h)
=c(r, delta), the alpha (shape) parameter and
    \(\beta\) (rate) parameter to the gamma distribution prior
    G(r,delta) for the \(\lambda^2\) parameter under
    the lasso model; or, the \(\alpha\) (shape) parameter and
    \(\beta\) (scale) parameter to the
    inverse-gamma distribution IG(r/2, delta/2) prior for
    the \(\lambda^2\)
    parameter under the ridge regression model. A default of NULL
   generates appropriate non-informative values depending on the
   nature of the regression.  Specifying rd=FALSE causes
  lambda2 values to be fixed at their starting value, i.e., not
  sampled.  See the details below for information
   on the special settings for ridge regression
=c(a, b), the \(\alpha\) (shape)
   parameter and the \(\beta\) (scale) parameter for the
   inverse-gamma distribution prior IG(a,b) for the variance
   parameter s2.  A default of NULL generates appropriate
   non-informative values depending on the nature of the regression
the rate parameter (> 0) to the exponential prior
   on the degrees of freedom paramter nu under a model with
   Student-t errors implemented by a scale-mixture prior.
   The default setting of theta = 0 turns off this prior,
   defaulting to a normal errors prior
indicates whether Rao-Blackwellized samples for
    \(\sigma^2\) should be used (default TRUE); see
    below for more details
if TRUE, an implicit intercept term is fit
    in the model, otherwise the the intercept is zero; default is
    TRUE
if TRUE, each variable is standardized
    to have unit L2-norm, otherwise it is left alone; default is
    TRUE
verbosity level; currently only verb = 0 and
    verb = 1 are supported
Robert B. Gramacy rbg@vt.edu
The Bayesian lasso model and Gibbs Sampling algorithm is described
  in detail in Park & Casella (2008).  The algorithm implemented
  by this function is identical to that described therein, with
  the exception of an added “option” to use a Rao-Blackwellized
  sample of \(\sigma^2\) (with \(\beta\) integrated out)
  for improved mixing, and the model selections by RJ described below.
  When input argument lambda2 = 0 is
  supplied, the model is a simple hierarchical linear model where
  \((\beta,\sigma^2)\) is given a Jeffrey's prior
Specifying RJ = TRUE causes Bayesian model selection and
  averaging to commence for choosing which of the columns of the
  design matrix X (and thus parameters beta) should be
  included in the model.  The zero-components of the beta input
  specify which columns are in the initial model, and
  M specifies the maximal number of columns.
The RJ mechanism implemented here for the Bayesian lasso model selection differs from the one described by Hans (2009), which is based on an idea from Geweke (1996). Those methods require departing from the Park & Casella (2008) latent-variable model and requires sampling from each conditional \(\beta_i | \beta_{(-i)}, \dots\) for all \(i\), since a mixture prior with a point-mass at zero is placed on each \(\beta_i\). Out implementation here requires no such special prior and retains the joint sampling from the full \(\beta\) vector of non-zero entries, which we believe yields better mixing in the Markov chain. RJ proposals to increase/decrease the number of non-zero entries does proceed component-wise, but the acceptance rates are high due due to marginalized between-model moves (Troughton & Godsill, 1997).
When the lasso prior or RJ is used, the automatic thinning level
  (unless thin != NULL) is determined by the number of columns
  of X since this many latent variables are introduced
Bayesian ridge regression is implemented as a special case via the
  bridge function.  This essentially calls blasso
  with case = "ridge". A default setting of rd = c(0,0) is
  implied by rd = NULL, giving the Jeffery's prior for the
  penalty parameter \(\lambda^2\) unless ncol(X) >=
    length(y) in which case the proper specification of rd =
    c(5,10) is used instead.
The Normal--Gamma prior (Griffin & Brown, 2009) is implemented as
  an extension to the Bayesian lasso with case = "ng".  Many
  thanks to James Scott for providing the code needed to extend the
  method(s) to use the horseshoe prior (Carvalho, Polson, Scott, 2010).
When theta > 0 then the Student-t errors via scale mixtures
  (and thereby extra latent variables omega2) of Geweke (1993)
  is applied as an extension to the Bayesian lasso/ridge model.
  If Student-t errors are used the automatic thinning level
  is augmented (unless thin != NULL) by the number of rows
  in X since this many latent variables are introduced
Park, T., Casella, G. (2008). The Bayesian Lasso.
  Journal of the American Statistical Association, 103(482),
  June 2008, pp. 681-686
  tools:::Rd_expr_doi("10.1198/016214508000000337")
Griffin, J.E. and Brown, P.J. (2009).
  Inference with Normal-Gamma prior distributions in
    regression problems. Bayesian Analysis, 5, pp. 171-188.
  tools:::Rd_expr_doi("10.1214/10-BA507")
Hans, C. (2009). Bayesian Lasso regression.
  Biometrika 96, pp. 835-845.
  tools:::Rd_expr_doi("10.1093/biomet/asp047")
Carvalho, C.M., Polson, N.G., and Scott, J.G. (2010) The
 horseshoe estimator for sparse signals. Biometrika 97(2):
 pp. 465-480.
  https://faculty.chicagobooth.edu/nicholas.polson/research/papers/Horse.pdf
Geweke, J. (1996). Variable selection and model comparison in regression. In Bayesian Statistics 5. Editors: J.M. Bernardo, J.O. Berger, A.P. Dawid and A.F.M. Smith, 609-620. Oxford Press.
Paul T. Troughton and Simon J. Godsill (1997). A reversible jump sampler for autoregressive time series, employing full conditionals to achieve efficient model space moves. Technical Report CUED/F-INFENG/TR.304, Cambridge University Engineering Department.
Geweke, J. (1993) Bayesian treatment of the independent Student-t linear model. Journal of Applied Econometrics, Vol. 8, S19-S40
## following the lars diabetes example
data(diabetes)
attach(diabetes)
## Ordinary Least Squares regression
reg.ols <- regress(x, y)
## Lasso regression
reg.las <- regress(x, y, method="lasso")
## Bayesian Lasso regression
reg.blas <- blasso(x, y)
## summarize the beta (regression coefficients) estimates
plot(reg.blas, burnin=200)
points(drop(reg.las$b), col=2, pch=20)
points(drop(reg.ols$b), col=3, pch=18)
legend("topleft", c("blasso-map", "lasso", "lsr"),
       col=c(2,2,3), pch=c(21,20,18))
## plot the size of different models visited
plot(reg.blas, burnin=200, which="m")
## get the summary
s <- summary(reg.blas, burnin=200)
## calculate the probability that each beta coef != zero
s$bn0
## summarize s2
plot(reg.blas, burnin=200, which="s2")
s$s2
## summarize lambda2
plot(reg.blas, burnin=200, which="lambda2")
s$lambda2
if (FALSE) {
## fit with Student-t errors
## (~400-times slower due to automatic thinning level)
regt.blas <- blasso(x, y, theta=0.1)
## plotting some information about nu, and quantiles
plot(regt.blas, "nu", burnin=200)
quantile(regt.blas$nu[-(1:200)], c(0.05, 0.95))
## Bayes Factor shows strong evidence for Student-t model
mean(exp(regt.blas$llik[-(1:200)] - regt.blas$llik.norm[-(1:200)]))
}
## clean up
detach(diabetes)
Run the code above in your browser using DataLab