Approximate computation of summaries of the posterior distribution using a Gibbs sampling algorithm to explore the model space and frequency of "visits" to construct the estimates.
GibbsBvs(
formula,
data,
null.model = paste(as.formula(formula)[[2]], " ~ 1", sep = ""),
prior.betas = "Robust",
prior.models = "ScottBerger",
n.iter = 10000,
init.model = "Full",
n.burnin = 500,
n.thin = 1,
time.test = TRUE,
priorprobs = NULL,
seed = runif(1, 0, 16091956)
)
GibbsBvs
returns an object of class Bvs
with the
following elements:
The internal time consumed in solving the problem
The lm
class object that results when the
model defined by formula
is fitted by lm
The
lm
class object that results when the model defined by
fixed.cov
is fitted by lm
The name of all the potential explanatory variables
Number of observations
Number of explanatory variables to select from
Number of fixed variables
The binary expression of the most probable model found.
A named vector with the estimates of the inclusion probabilities of all the variables.
A data.frame
with the estimates of the joint
inclusion probabilities of all the variables.
Estimates of posterior probabilities of the dimension of the true model.
A matrix with both the binary representation of the visited models after the burning period and the Bayes factor (log scale) of that model to the null model.
If prior.models
="User" then this vector is stored here. Else, the #' type of prior as defined in prior.models
The call
to the
function.
An estimation of the normalizing constant (C=sum Bi Pr(Mi), for Mi in the model space) using the method in George and McCulloch (1997).
gibbs
prior.betas
prior.models
priorprobs
Formula defining the most complex regression model in the analysis. See details.
data frame containing the data.
A formula defining which is the simplest (null) model. It should be nested in the full model. By default, the null model is defined to be the one with just the intercept.
Prior distribution for regression parameters within each model (to be literally specified). Possible choices include "Robust", "Robust.G", "Liangetal", "gZellner", "ZellnerSiow", "FLS", "intrinsic.MGC" and "IHG" (see details).
Prior distribution over the model space (to be literally specified). Possible choices are "Constant", "ScottBerger" and "User" (see details).
The total number of iterations performed after the burn in process.
The model at which the simulation process starts. Options
include "Null" (the model only with the covariates specified in
fixed.cov
), "Full" (the model defined by formula
), "Random" (a
randomly selected model) and a vector with p (the number of covariates to
select from) zeros and ones defining a model. When p>n the dimension of the
init.model must be smaller than n. Otherwise the function produces
an error.
Length of burn in, i.e. number of iterations to discard at the beginning.
Thinning rate. Must be a positive integer. Set 'n.thin' > 1
to save memory and computation time if 'n.iter' is large. Default is 1. This
parameter jointly with n.iter
sets the number of simulations kept and
used to construct the estimates so is important to keep in mind that a large
value for 'n.thin' can reduce the precision of the results
If TRUE and the number of variables is large (>=21) a preliminary test to estimate computational time is performed.
A p+1 dimensional vector defining the prior probabilities
Pr(M_i) (should be used in the case where prior.models
="User"; see
the details in Bvs
.)
A seed to initialize the random number generator
Gonzalo Garcia-Donato and Anabel Forte
This is a heuristic approximation to the function
Bvs
so the details there apply also here.
The algorithm implemented is a Gibbs sampling-based searching algorithm originally proposed by George and McCulloch (1997). Garcia-Donato and Martinez-Beneito (2013) have shown that this simple sampling strategy in combination with estimates based on frequency of visits (the one here implemented) provides very reliable results.
Garcia-Donato, G. and Martinez-Beneito, M.A. (2013)<DOI:10.1080/01621459.2012.742443> On sampling strategies in Bayesian variable selection problems with large model spaces. Journal of the American Statistical Association, 108: 340-352.
George E. and McCulloch R. (1997) Approaches for Bayesian variable selection. Statistica Sinica, 7, 339:372.
plot.Bvs
for several plots of the result,
BMAcoeff
for obtaining model averaged simulations
of regression coefficients and predict.Bvs
for
predictions.
See GibbsBvsF
if there are factors among the explanatory variables.
See pltltn
for corrections on estimations for the
situation where p>>n. See the help in pltltn
for an application
in this situation.
Consider Bvs
for exact
version obtained enumerating all entertained models (recommended when
p<20).
if (FALSE) {
#Analysis of Ozone35 data
data(Ozone35)
#We use here the (Zellner) g-prior for
#regression parameters and constant prior
#over the model space
#In this Gibbs sampling scheme, we perform 10100 iterations,
#of which the first 100 are discharged (burnin) and of the remaining
#only one each 10 is kept.
#as initial model we use the Full model
Oz35.GibbsBvs<- GibbsBvs(formula= y ~ ., data=Ozone35, prior.betas="gZellner",
prior.models="Constant", n.iter=10000, init.model="Full", n.burnin=100,
time.test = FALSE)
#Note: this is a heuristic approach and results are estimates
#of the exact answer.
#with the print we can see which is the most probable model
#among the visited
Oz35.GibbsBvs
#The estimation of inclusion probabilities and
#the model-averaged estimation of parameters:
summary(Oz35.GibbsBvs)
#Plots:
plot(Oz35.GibbsBvs, option="conditional")
}
Run the code above in your browser using DataLab