Learn R Programming

BAS (version 1.2.1)

bas.glm: Bayesian Adaptive Sampling Without Replacement for Variable Selection in Generalized Linear Models

Description

Sample with or without replacement from a posterior distribution on GLMs

Usage

bas.glm(formula, data, 
 family = binomial(link = "logit"), 
 n.models = NULL, betaprior=CCH(alpha=.5, beta=nrow(data), s=0),
 modelprior = beta.binomial(1,1), 
 initprobs = "Uniform", method = "MCMC", update = NULL,
 bestmodel = NULL, prob.rw = 0.5, 
 MCMC.iterations = NULL, control = glm.control(), 
 offset = rep(0, nobs), weights = rep(1, nobs), laplace=FALSE)

Arguments

formula
generalized linear model formula for the full model with all predictors, Y ~ X. All code assumes that an intercept will be included in each model.
data
data frame
family
a description of the error distribution and link function for exponential family; currently only binomial() with the logitistic linke is available in this version.
n.models
number of unique models to keep. If NULL, BAS will attempt to enumerate unless p > 35 or method="MCMC". For any of methods using MCMC algorithms that sample with replacement, sampling will stop when the number of iterations exceeds the min of
betaprior
Prior on coefficients for model coefficients (except intercept). Options in clude CCH, robust, beta-prime, AIC, BIC.
modelprior
Family of prior distribution on the models. Choices include uniform, Bernoulli or beta.binomial.
initprobs
vector of length p with the initial inclusion probabilities used for sampling without replacement (the intercept will be included with probability one and does not need to be added here) or a character string giving the method used to constru
method
A character variable indicating which sampling method to use: method="BAS" uses Bayesian Adaptive Sampling (without replacement) using the sampling probabilities given in initprobs and updates using the marginal inclusion probabilities to dire
update
number of iterations between potential updates of the sampling probabilities in the "BAS" method. If NULL do not update, otherwise the algorithm will update using the marginal inclusion probabilities as they change while sampling takes place.
bestmodel
optional binary vector representing a model to initialize the sampling. If NULL sampling starts with the null model
prob.rw
For any of the MCMC methods, probability of using the random-walk proposal; otherwise use a random "flip" move to propose a new model.
MCMC.iterations
Number of models to sample when using any of the MCMC options; should be greater than 'n.models'.
control
a list of parameters that control convergence in the fitting process. See the documentation for glm.control()
offset
a priori known component to be included in the linear predictor
weights
optional vector of weights to be used in the fitting process. SHould be NULL or a numeric vector.
laplace
logical variable for whether to use a Laplace approximate for integration with respect to g to obtain the marginal likelihood. If FALSE the Cephes library is used which may be inaccurate for large n or large values of the Wald Chisquared stat

Value

  • bas.glm returns an object of class BMA

    An object of class BMA is a list containing at least the following components:

  • postprobsthe posterior probabilities of the models selected
  • priorprobsthe prior probabilities of the models selected
  • logmargvalues of the log of the marginal likelihood for the models
  • n.varstotal number of independent variables in the full model, including the intercept
  • sizethe number of independent variables in each of the models, includes the intercept
  • whicha list of lists with one list per model with variables that are included in the model
  • probne0the posterior probability that each variable is non-zero
  • coefficientslist of lists with one list per model giving the GLM estimate of each (nonzero) coefficient for each model.
  • selist of lists with one list per model giving the GLM standard error of each coefficient for each model
  • deviancethe GLM deviance for each model
  • modelpriorthe prior distribution on models that created the BMA object
  • Qthe Q statistic for each model used in the marginal likelihood approximation
  • Yresponse
  • Xmatrix of predictors

Details

BAS provides several search algorithms to find high probability models for use in Bayesian Model Averaging or Bayesian model selection. For p less than 20-25, BAS can enumerate all models depending on memory availability, for larger p, BAS samples without replacement using random or deterministic sampling. The Bayesian Adaptive Sampling algorithm of Clyde, Ghosh, Littman (2010) samples models without replacement using the initial sampling probabilities, and will optionally update the sampling probabilities every "update" models using the estimated marginal inclusion probabilties. BAS uses different methods to obtain the initprobs, which may impact the results in high-dimensional problems. The deterinistic sampler provides a list of the top models in order of an approximation of independence using the provided initprobs. This may be effective after running the other algorithms to identify high probability models and works well if the correlations of variables are small to modest. The priors on coefficients are mixtures of g-priors that provide approximations to the power prior.

References

Li, Y. and Clyde, M. (2015) Mixtures of g-priors in Generalized Linear Models. http://arxiv.org/abs/1503.06913

Clyde, M. Ghosh, J. and Littman, M. (2010) Bayesian Adaptive Sampling for Variable Selection and Model Averaging. Journal of Computational Graphics and Statistics. 20:80-101 http://dx.doi.org/10.1198/jcgs.2010.09049

Raftery, A.E, Madigan, D. and Hoeting, J.A. (1997) Bayesian Model Averaging for Linear Regression Models. Journal of the American Statistical Association.

Examples

Run this code
##---- Should be DIRECTLY executable !! ----
library(MASS)
data(Pima.tr)

out = bas.glm(type ~ ., data=Pima.tr, n.models= 2^7, method="BAS",
 betaprior=CCH(a=1, b=532/2, s=0), family=binomial(),
 modelprior=beta.binomial(1,1), laplace=FALSE)

summary(out)
image(out)

Run the code above in your browser using DataLab