loo.stanreg: Leave-one-out cross-validation (LOO)

Description

For models fit using MCMC, compute approximate leave-one-out cross-validation (LOO) or, less preferably, the Widely Applicable Information Criterion (WAIC) using the loo package. Compare two or more models using the compare function.

Usage

## S3 method for class 'stanreg':
loo(x, ...)
## S3 method for class 'stanreg':
waic(x, ...)

Arguments

A fitted model object returned by one of the rstanarm modeling functions. See stanreg-objects.

...

Optional arguments to pass to psislw. Possible arguments and their defaults are: [object Object],[object Object],[object Object]

We recommend using the default values for the psislw argument

Value

An object of class 'loo'. See the 'Value' section in loo and waic for details on the structure of these objects.

Details

The LOO Information Criterion (LOOIC) has the same purpose as the Aikaike Information Criterion (AIC) that is used by frequentists. Both are intended to estimate the expected log predicted density (ELPD) for a new dataset. However, the AIC ignores priors and assumes that the posterior distribution is multivariate normal, whereas the functions from the loo package do not make this distributional assumption and integrate over uncertainty in the parameters. This only assumes that any one observation can be omitted without having a major effect on the posterior distribution, which can be judged using the diagnostic plot provided by the plot.loo method. The How to Use the rstanarm Package vignette has an example of this entire process.

References

Vehtari, A., Gelman, A., and Gabry, J. (2016). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. http://arxiv.org/abs/1507.04544/ (preprint)

Examples

Run this code

SEED <- 42024
set.seed(SEED)

fit1 <- stan_glm(mpg ~ wt, data = mtcars, seed = SEED)
fit2 <- update(fit1, formula = . ~ . + cyl)
(loo1 <- loo(fit1))
loo2 <- loo(fit2)
compare(loo1, loo2)
plot(loo2)


# dataset description at help("lalonde", package = "arm")
data(lalonde, package = "arm") 
t7 <- student_t(df = 7) # prior for coefficients

f1 <- treat ~ re74 + re75 + educ + black + hisp + married + 
   nodegr + u74 + u75
lalonde1 <- stan_glm(f1, data = lalonde, family = binomial(link="logit"), 
                     prior = t7, cores = 4, seed = SEED)
                 
f2 <- treat ~ age + I(age^2) + educ + I(educ^2) + black + hisp + 
   married + nodegr + re74  + I(re74^2) + re75 + I(re75^2) + u74 + u75   
lalonde2 <- update(lalonde1, formula = f2)

(loo_lalonde1 <- loo(lalonde1))
(loo_lalonde2 <- loo(lalonde2))
plot(loo_lalonde2, label_points = TRUE)
compare(loo_lalonde1, loo_lalonde2)

Run the code above in your browser using DataLab