
projpred is an R package for performing a projection predictive variable (or "feature") selection for generalized linear models (GLMs), generalized linear multilevel (or "mixed") models (GLMMs), generalized additive models (GAMs), and generalized additive multilevel (or "mixed") models (GAMMs), with the support for additive models still being experimental. Note that the term "generalized" includes the Gaussian family as well.
The package is compatible with rstanarm and brms, but developers
of other packages are welcome to add new get_refmodel()
methods (which
enable the compatibility of their packages with projpred). Custom
reference models can also be used via init_refmodel()
. It is via custom
reference models that projpred supports the projection onto candidate
models whose predictor terms are not a subset of the reference model's
predictor terms. However, for rstanarm and brms reference models,
projpred only supports the projection onto submodels of the reference
model. For the sake of simplicity, throughout this package, we use the term
"submodel" for all kinds of candidate models onto which the reference model
is projected, even though this term is not always appropriate for custom
reference models.
Currently, the supported families are gaussian()
, binomial()
(and---via
brms::get_refmodel.brmsfit()
---also brms::bernoulli()
), as well as
poisson()
.
The projection of the reference model onto a submodel can be run on multiple
CPU cores in parallel (across the projected draws). This is powered by the
foreach package. Thus, you can use any parallel (or sequential) backend
compatible with foreach, e.g., the backends from packages
doParallel, doMPI, or doFuture. Using the global option
projpred.prll_prj_trigger
, you can modify the number of projected draws
below which no parallelization is used (even if a parallel backend is
registered). Such a "trigger" threshold exists because of the computational
overhead of a parallelization which makes parallelization only useful for a
sufficiently large number of projected draws. By default, parallelization is
turned off, which can also be achieved by supplying Inf
(or NULL
) to
option projpred.prll_prj_trigger
. Note that we cannot recommend
parallelizing the projection on Windows because in our experience, the
parallelization overhead is larger there, causing a parallel run to take
longer than a sequential run. Also note that the parallelization works well
for GLMs, but for GLMMs, GAMs, and GAMMs, the fitted model objects are quite
big, which---when running in parallel---may lead to an excessive memory usage
which in turn may crash the R session. Thus, we currently cannot recommend
the parallelization for GLMMs, GAMs, and GAMMs.
The vignettes (currently, there is only a single one) illustrate how to use the projpred functions in conjunction. Shorter examples are included here in the documentation.
Some references relevant for this package are given in section "References"
below. See citation(package = "projpred")
for details on citing
projpred.
init_refmodel()
, get_refmodel()
For setting up a reference model (only rarely needed explicitly).
varsel()
, cv_varsel()
For variable selection, possibly with cross-validation (CV).
summary.vsel()
, print.vsel()
, plot.vsel()
,
suggest_size.vsel()
, solution_terms.vsel()
For post-processing the results from the variable selection.
project()
For projecting the reference model onto submodel(s). Typically, this follows the variable selection, but it can also be applied directly (without a variable selection).
as.matrix.projection()
For extracting projected parameter draws.
proj_linpred()
, proj_predict()
For making predictions from a submodel (after projecting the reference model onto it).
Maintainer: Frank Weber fweber144@protonmail.com
Authors:
Juho Piironen juho.t.piironen@gmail.com
Markus Paasiniemi
Alejandro Catalina alecatfel@gmail.com
Aki Vehtari
Other contributors:
Jonah Gabry [contributor]
Marco Colombo [contributor]
Paul-Christian Bürkner [contributor]
Hamada S. Badr [contributor]
Goutis, C. and Robert, C. P. (1998). Model choice in generalised linear models: A Bayesian approach via Kullback–Leibler projections. Biometrika, 85(1):29–37.
Dupuis, J. A. and Robert, C. P. (2003). Variable selection in qualitative models via an entropic explanatory power. Journal of Statistical Planning and Inference, 111(1-2):77–94. tools:::Rd_expr_doi("10.1016/S0378-3758(02)00286-0").
Piironen, J. and Vehtari, A. (2017). Comparison of Bayesian predictive methods for model selection. Statistics and Computing, 27(3):711-735. tools:::Rd_expr_doi("10.1007/s11222-016-9649-y").
Piironen, J., Paasiniemi, M., and Vehtari, A. (2020). Projective inference in high-dimensional problems: Prediction and feature selection. Electronic Journal of Statistics, 14(1):2155-2197. tools:::Rd_expr_doi("10.1214/20-EJS1711").
Catalina, A., Bürkner, P.-C., and Vehtari, A. (2020). Projection predictive inference for generalized linear and additive multilevel models. arXiv:2010.06994. URL: https://arxiv.org/abs/2010.06994.
Useful links: