Given data and prior information, this function samples all possible model combinations via MC3 or enumeration and returns aggregate results.
bms(
X.data,
burn = 1000,
iter = NA,
nmodel = 500,
mcmc = "bd",
g = "UIP",
mprior = "random",
mprior.size = NA,
user.int = TRUE,
start.value = NA,
g.stats = TRUE,
logfile = FALSE,
logstep = 10000,
force.full.ols = FALSE,
fixed.reg = numeric(0)
)
A list of class bma
, that may be displayed using e.g.
summary.bma
or coef.bma
. The list contains the
following elements:
a list of aggregate statistics: iter
is the number of iterations, burn
the number of burn-ins.
The
following have to be divided by cumsumweights
to get posterior
expected values: inccount
are the posterior inclusion probabilities,
b1mo
and b2mo
the first and second moment of coefficients,
add.otherstats
other statistics of interest (typically the moments of
the shrinkage factor), msize
is the post. expected model size,
k.vec
the posterior model size distribution, pos.sign
the
unconditional post. probability of positive coefficients, corr.pmp
is
the correlation between the best models' MCMC frequencies and their marg.
likelihoods.
timed
is the time that was needed for MCMC sampling,
cons
is the posterior expected value of the constant. K
and
N
are the maximum number of covariates and the sample size,
respectively.
a list of the evaluated function arguments
provided to bms
(see above)
a 'topmod' object
containing the best drawn models. see topmod
for more details
the positions of the starting model. If bmao is a'bma'
object this corresponds to covariates bmao$reg.names[bmao$start.pos]. If
bmao is a chain that resulted from several starting models (cf.
c.bma
, then start.pos
is a list detailing all of them.
a list of class gprior-class
, detailing
information on the g-prior: gtype
corresponds to argument g
above, is.constant
is FALSE if gtype
is either "hyper" or
"EBL", return.g.stats
corresponds to argument g.stats
above,
shrinkage.moments
contains the first and second moments of the
shrinkage factor (only if return.g.stats==TRUE
), g
details the
fixed g (if is.constant==TRUE
), hyper.parameter
corresponds to
the hyper-g parameter \(a\) as in Liang et al. (2008)
a list of class mprior-class
, detailing
information on the model prior: origargs
lists the original arguments
to mprior
and mprior.size
above; mp.msize
denotes the
prior mode size; mp.Kdist
is a (K+1) vector with the prior model size
distribution from 0 to K
data.frame or matrix: corresponds to
argument X.data
above, possibly cleaned for NAs
character vector: the covariate names to be used for X.data
(corresponds to variable.names.bma
the
original call to the bms
function
a data frame or a matrix, with the dependent variable in the
first column, followed by the covariates (alternatively, X.data
can
also be provided as a formula
). Note that bms
automatically estimates a constant, therefore including constant terms is
not necessary.
The (positive integer) number of burn-in draws for the MC3 sampler, defaults to 1000. (Not taken into account if mcmc="enumerate")
If mcmc is set to an MC3 sampler, then this is the number of
iteration draws to be sampled (ex burn-ins), default 3000 draws.
If
mcmc="enumerate"
, then iter is the number of models to be sampled,
starting from 0 (defaults to \(2^K-1\)) - cf. start.value
.
the number of best models for which information is stored
(default 500). Best models are used for convergence analysis between
likelihoods and MCMC frequencies, as well as likelihood-based inference.
Note that a very high value for nmodel
slows down the sampler
significantly. Set nmodel=0 to speed up sampling (if best model information
is not needed).
a character denoting the model sampler to be used.
The MC3
sampler mcmc="bd"
corresponds to a birth/death MCMC algogrithm.
mcmc="rev.jump"
enacts a reversible jump algorithm adding a "swap"
step to the birth / death steps from "bd".
Alternatively, the entire
model space may be fully enumerated by setting mcmc="enumerate"
which
will iterate all possible regressor combinations (Note: consider that this
means \(2^K\) iterations, where K is the number of covariates.)
Default
is full enumeration (mcmc="enumerate"
) with less then 15 covariates,
and the birth-death MC3 sampler (mcmc="bd"
) with 15 covariates or
more. Cf. section 'Details' for more options.
the hyperparameter on Zellner's g-prior for the regression
coefficients.
g="UIP"
corresponds to \(g=N\), the number of
observations (default);
g="BRIC"
corresponds to the benchmark
prior suggested by Fernandez, Ley and Steel (2001), i.e \(g=max(N, K^2)\),
where K is the total number of covariates;
g="RIC"
sets
\(g=K^2\) and conforms to the risk inflation criterion by George and
Foster (1994)
g="HQ"
sets \(g=log(N)^3\) and asymptotically
mimics the Hannan-Quinn criterion with \(C_{HQ}=3\) (cf. Fernandez, Ley
and Steel, 2001, p.395)
g="EBL"
estimates a local empirical Bayes
g-parameter (as in Liang et al. (2008));
g="hyper"
takes the
'hyper-g' prior distribution (as in Liang et al., 2008) with the default
hyper-parameter \(a\) set such that the prior expected shrinkage factor
conforms to 'UIP';
This hyperparameter \(a\) can be adjusted (between
\(2<a<=4\)) by setting g="hyper=2.9"
, for instance.
Alternatively, g="hyper=UIP"
sets the prior expected value of the
shrinkage factor equal to that of UIP (default), g="hyper=BRIC"
sets
it according to BRIC
cf section 'Details' fro more on the hyper-g prior
a character denoting the model prior choice, defaulting to
"random":
mprior="fixed"
denotes fixed common prior inclusion
probabilities for each regressor as e.g. in Sala-i-Martin, Doppelhofer, and
Miller(2004) - for their fine-tuning, cf. mprior.size
. Preferable to
mcmc="random"
if strong prior information on model size exists;
mprior="random"
(default) triggers the 'random theta' prior by Ley
and Steel (2008), who suggest a binomial-beta hyperprior on the a priori
inclusion probability;
mprior="uniform"
employs the uniform model
prior;
mprior="customk"
allows for custom model size priors (cf.
mprior.size
);
mprior="pip"
allows for custom prior
inclusion probabilities (cf. mprior.size
);
Note that the prior on
models with more than N-3 regressors is automatically zero: these models
will not be sampled.
if mprior
is "fixed" or "random",
mprior.size
is a scalar that denotes the prior expected value of the
model size prior (default K/2).
If mprior="customk"
then a custom
model size prior can be provided as a K+1 vector detailing the priors from
model size 0 to K (e.g. rep(1,K+1) for the uniform model prior);
if
mprior="pip"
, then custom prior inclusion probabilities can be
provided as a vector of size K, with elements in the interval (0,1)
'interactive mode': print out results to console after ending the routine and plots a chart (default TRUE).
specifies the starting model of the iteration chain. For
instance a specific model by the corresponding column indices (e.g.
starting.model=numeric(K) starts from the null model including solely a
constant term) or start.value=c(3,6)
for a starting model only
including covariates 3 and 6.
If start.model
is set to an integer
(e.g. start.model=15
) then that number of covariates (here: 15
covariates) is randomly chosen and the starting model is identified by those
regressors with an OLS t-statistic>0.2.
The default value
start.value=NA
corresponds to
start.value=min(ncol(X.data),nrow(X.data)-3)
. Note that
start.value=0
or start.value=NULL
starts from the null
model.
If mcmc="enumerate"
then start.value
is the index to
start the iteration (default: 0, the null model) . Any number between 0 and
\(K^2-1\) is admissible.
TRUE
if statistics on the shrinkage factor g/(1+g)
should be collected, defaulting to TRUE (Note: set g.stats=FALSE
for
faster iteration.)
setting logfile=TRUE
produces a logfile named
"test.log"
in your current working directory, in order to keep track
of the sampling procedure. logfile
equal to some filepath (like
logfile="subfolder/log.txt"
) puts the logfile into that specified
position. (default: logfile=FALSE
). Note that logfile=""
implies log printouts on the console.
specifies at which number of posterior draws information is written to the log file; default: 10 000 iterations
default FALSE. If force.full.ols=TRUE
, the OLS
estimation part of the sampling procedure relies on slower matrix inversion,
instead of streamlined routines. force.full.ols=TRUE
can slow down
sampling but may deal better with highly collinear data
indices or variable names of X.data
that are fixed
regressors to be always included in every sampled model. Note: the parameter
mprior.size
refers to prior model size including these fixed
regressors.
The models analyzed are Bayesian normal-gamma conjugate models with improper constant and variance priors akin to Fernandez, Ley and Steel (2001): A model \(M\) can be described as follows, with \(\epsilon\) ~ \(N(0,\sigma^2 I)\): $$latex$$ $$f(\beta | \sigma, M, g) ~ N(0, g \sigma^2 (X'X)^-1) $$
Moreover, the (improper) prior on the constant \(f(\alpha)\) is put proportional to 1. Similarly, the variance prior \(f(\sigma)\) is proportional to \(1/\sigma\).
Martin Feldkircher, Paul Hofmarcher, and Stefan Zeugner
Ad mcmc
:
Interaction sampler: adding an ".int" to an MC3 sampler
(e.g. "mcmc="bd.int") provides for special treatment of interaction terms.
Interaction terms will only be sampled along with their component variables:
In the colnumn names of X.data, interaction terms need to be denominated by
names consisting of the base terms separated by #
(e.g. an
interaction term of base variables "A"
, "B"
and "C"
needs column name "A#B#C"
). Then variable "A#B#C"
will only be
included in a model if all of the component variables ("A", "B", and "C")
are included.
The MC3 samplers "bd
", "rev.jump
", "bd.int
" and
"rev.jump.int
", iterate away from a starting model by adding,
dropping or swapping (only in the case of rev.jump) covariates.
In an MCMC fashion, they thus randomly draw a candidate model and then move to it in case its marginal likelihood (marg.lik.) is superior to the marg.lik. of the current model.
In case the candidate's marg.lik is inferior, it is randomly accepted or rejected according to a probability formed by the ratio of candidate marg.lik over current marg.lik. Over time, the sampler should thus converge to a sensible distribution. For aggregate results based on these MC3 frequencies, the first few iterations are typically disregarded (the 'burn-ins').
Ad g
and the hyper-g prior: The hyper-g prior introduced by Liang et
al. (2008) puts a prior distribution on the shrinkage factor \(g/(1+g)\),
namely a Beta distribution \( Beta(1, 1/2-1)\) that is governed by the
parameter \(a\). \(a=4\) means a uniform prior distribution of the
shrinkage factor, while \(a>2\) close to 2 concentrates the prior
shrinkage factor close to one.
The prior expected value is
\(E(g/1+g)) = 2/a\). In this sense g="hyper=UIP"
and
g="hyper=BRIC"
set the prior expected shrinkage such that it conforms
to a fixed UIP-g (eqng=N) or BRIC-g (\(g=max(K^2,N)\) ).
http://bms.zeugner.eu: BMS package homepage with help and tutorials
Feldkircher, M. and S. Zeugner (2015): Bayesian Model Averaging Employing Fixed and Flexible Priors: The BMS Package for R, Journal of Statistical Software 68(4).
Feldkircher, M. and S. Zeugner (2009): Benchmark Priors Revisited: On Adaptive Shrinkage and the Supermodel Effect in Bayesian Model Averaging, IMF Working Paper 09/202.
Fernandez, C. E. Ley and M. Steel (2001): Benchmark priors for Bayesian model averaging. Journal of Econometrics 100(2), 381--427
Ley, E. and M. Steel (2008): On the Effect of Prior Assumptions in Bayesian Model Averaging with Applications to Growth Regressions. working paper
Liang, F., Paulo, R., Molina, G., Clyde, M. A., and Berger, J. O. (2008). Mixtures of g Priors for Bayesian Variable Selection. Journal of the American Statistical Association 103, 410-423.
Sala-i-Martin, X. and G. Doppelhofer and R.I. Miller (2004): Determinants of long-term growth: a Bayesian averaging of classical estimates (BACE) approach. American Economic Review 94(4), 813--835
coef.bma
, plotModelsize
and
density.bma
for some operations on the resulting 'bma' object,
c.bma
for integrating separate MC3 chains and splitting of
sampling over several runs.
Check http://bms.zeugner.eu for additional help.
data(datafls)
#estimating a standard MC3 chain with 1000 burn-ins and 2000 iterations and uniform model priors
bma1 = bms(datafls,burn=1000, iter=2000, mprior="uniform")
##standard coefficients based on exact likelihoods of the 100 best models:
coef(bma1,exact=TRUE, std.coefs=TRUE)
#suppressing user-interactive output, using a customized starting value, and not saving the best
# ...models for only 19 observations (but 41 covariates)
bma2 = bms(datafls[20:39,],burn=1000, iter=2000, nmodel=0, start.value=c(1,4,7,30),
user.int=FALSE)
coef(bma2)
#MC3 chain with a hyper-g prior (custom coefficient a=2.1), saving only the 20 best models,
# ...and an alternative sampling procedure; putting a log entry to console every 1000th step
bma3 = bms(datafls,burn=1000, iter=5000, nmodel=20, g="hyper=2.1", mcmc="rev.jump",
logfile="",logstep=1000)
image(bma3) #showing the coefficient signs of the 20 best models
#enumerating with 10 covariates (= 1024 models), keeping the shrinkage factors
# ...of the best 200 models
bma4 = bms(datafls[,1:11],mcmc="enumerate",nmodel=200,g.stats=TRUE)
#using an interaction sampler for two interaction terms
dataint=datafls
dataint=cbind(datafls,datafls$LifeExp*datafls$Abslat/1000,
datafls$Protestants*datafls$Brit-datafls$Muslim)
names(dataint)[ncol(dataint)-1]="LifeExp#Abslat"
names(dataint)[ncol(dataint)]="Protestants#Brit#Muslim"
bma5 = bms(X.data=dataint,burn=1000,iter=9000,start.value=0,mcmc="bd.int")
density(bma5,reg="English") # plot posterior density for covariate "English"
# a matrix as X.data argument
bms(matrix(rnorm(1000),100,10))
# keeping a set of fixed regressors:
bms(datafls, mprior.size=7, fixed.reg = c("PrScEnroll", "LifeExp", "GDP60"))
# Note that mprior.size=7 means prior model size of 3 fixed to 4 'uncertain' regressors
Run the code above in your browser using DataLab