Given a formula and a data frame, computes the maximum a posteriori (MAP) model and median probability model (MPM) for different choices of prior on the model parameters and the model space. Normal linear models are assumed for the data with the prior distribution on the model parameters being one or more of the following: PEP, intrinsic, Zellner’s \(g\)--prior, Zellner and Siow, benchmark, robust, hyper--\(g\) and related hyper--\(g\)--\(n\). The prior distribution on the model space can be either the uniform on models or the uniform on the model dimension (special case of the beta--binomial prior). The model space consists of all possible models including an intercept term. Model selection is performed by using either full enumeration and evaluation of all models (for model spaces of small--to--moderate dimension) or a Markov chain Monte Carlo (MCMC) scheme (for model spaces of large dimension).
comparepriors.lm(
formula,
data,
algorithmic.choice = "automatic",
priorbetacoeff = c("PEP", "intrinsic", "Robust", "gZellner", "ZellnerSiow", "FLS",
"hyper-g", "hyper-g-n"),
reference.prior = c(TRUE, FALSE),
priormodels = c("beta-binomial", "uniform"),
burnin = 1000,
itermcmc = 11000
)comparepriors.lm returns a list with two elements:
A data frame containing the MAP models for all different combinations of prior on the model parameters and the model space. In particular, in row \(i\) the following information is presented: prior on the model parameters, prior on the model space, hyperparameter value, MAP model (corresponding to the specific combination of priors on model parameters and model space) represented with variable inclusion indicators, and the R package used. When an MCMC scheme has been used, there are two additional columns: one depicting the specific algorithm that has been used and one with the MC standard error (to assess convergence). With an MCMC scheme, the MAP model output corresponds to the most frequently `visited'.
Same as the first element containing the MPM models instead.
A formula, defining the full model.
A data frame (of numeric values), containing the data.
A character, the type of algorithm to be used
for selection: full enumeration and evaluation of all models or an MCMC scheme.
One of ``automatic'' (the choice is done automatically based on the number
of explanatory variables in the full model), ``full enumeration''
or ``MCMC''. Default value="automatic".
A vector of character containing the different priors on the model
parameters. The character can be one of ``PEP'', ``intrinsic'', ``Robust'', ``gZellner'',
``ZellnerSiow'', ``FLS'', ``hyper--g'' and ``hyper--g--n''.
Default value=
c("PEP","intrinsic","Robust", "gZellner","ZellnerSiow",
"FLS","hyper-g","hyper-g-n"),
i.e., all supported priors are tested.
A vector of logical indicating the baseline prior that is used for
PEP/intrinsic. It can be TRUE (reference prior is used), FALSE (dependence Jeffreys prior
is used) or both. Default value=c(TRUE,FALSE), i.e., both baseline priors are tested.
A vector of character containing the different priors on the model
space. The character can be one of ``beta--binomial'' and ``uniform''.
Default value=c("beta-binomial","uniform"), i.e., both supported priors are tested.
Non--negative integer, the burnin period for the MCMC scheme. Default value=1000.
Positive integer (larger than burnin),
the (total) number of iterations for the MCMC scheme. Default value=11000.
The different priors on the model parameters are implemented using different packages: for PEP and intrinsic, the current package is used. For hyper--\(g\) and related hyper--\(g\)--n priors, the R package BAS is used. Finally, for the Zellner’s \(g\)--prior (``gZellner''), the Zellner and Siow (``ZellnerSiow''), the robust and the benchmark (``FLS'') prior, the results are obtained using BayesVarSel.
The prior distribution on the model space can be either the uniform on models or the beta--binomial. For the beta--binomial prior, the following special case is used: uniform prior on model dimension.
When an MCMC scheme is used, the R package BAS uses the birth/death random walk in Raftery et al. (1997) combined with a random swap move, BayesVarSel uses Gibbs sampling while PEPBVS implements the MC3 algorithm described in the Appendix of Fouskakis and Ntzoufras (2022).
To assess MCMC convergence, Monte Carlo (MC) standard error is
computed using batch means estimator (implemented in the R package mcmcse).
For computing a standard error, the number (itermcmc-burnin)
needs to be larger than 100.
This quantity cannot be computed for the cases treated by BAS ---
since all `visited' models are not available in the function output --- and thus for those cases
NA is depicted in the relevant column instead.
Similar to pep.lm, if algorithmic.choice equals ``automatic'' then
model selection is implemented as follows: if \(p < 20\) (where \(p\) is the
number of explanatory variables in the full model without the intercept), full enumeration
and evaluation of all models is performed, otherwise an MCMC scheme is used.
To avoid potential memory or time constraints, if algorithmic.choice
equals ``full enumeration'' but \(p \geq 20\), once issuing a warning message,
an MCMC scheme is used instead.
Similar constraints to pep.lm hold for the data, i.e.,
the case of missing data is not currently supported, the explanatory
variables need to be quantitative and cannot have an exact linear relationship,
and \(p\leq n-2\) (\(n\) being the sample size).
Bayarri, M., Berger, J., Forte, A. and Garcia--Donato, G. (2012) Criteria for Bayesian Model Choice with Application to Variable Selection. The Annals of Statistics, 40(3): 1550–1577. tools:::Rd_expr_doi("10.1214/12-AOS1013")
Fouskakis, D. and Ntzoufras, I. (2022) Power--Expected--Posterior Priors as Mixtures of g--Priors in Normal Linear Models. Bayesian Analysis, 17(4): 1073-1099. tools:::Rd_expr_doi("10.1214/21-BA1288")
Ley, E. and Steel, M. (2012) Mixtures of g--Priors for Bayesian Model Averaging with Economic Applications. Journal of Econometrics, 171(2): 251–266. tools:::Rd_expr_doi("10.1016/j.jeconom.2012.06.009")
Liang, F., Paulo, R., Molina, G., Clyde, M. and Berger, J. (2008) Mixtures of g Priors for Bayesian Variable Selection. Journal of the American Statistical Association, 103(481): 410–423. tools:::Rd_expr_doi("10.1198/016214507000001337")
Raftery, A., Madigan, D. and Hoeting, J. (1997) Bayesian Model Averaging for Linear Regression Models. Journal of the American Statistical Association, 92(437): 179–191. tools:::Rd_expr_doi("10.1080/01621459.1997.10473615")
Zellner, A. (1976) Bayesian and Non--Bayesian Analysis of the Regression Model with Multivariate Student--t Error Terms. Journal of the American Statistical Association, 71(354): 400–405. tools:::Rd_expr_doi("10.1080/01621459.1976.10480357")
Zellner, A. and Siow, A. (1980) Posterior Odds Ratios for Selected Regression Hypotheses. Trabajos de Estadistica Y de Investigacion Operativa, 31: 585-603. tools:::Rd_expr_doi("10.1007/BF02888369")
data(UScrime_data)
resc <- comparepriors.lm(y~.,UScrime_data,
priorbetacoeff = c("PEP","hyper-g-n"),
reference.prior = TRUE,priormodels = "beta-binomial")
Run the code above in your browser using DataLab