loo (version 1.0.0)

loo: Leave-one-out cross-validation (LOO)

Description

Efficient approximate leave-one-out cross-validation for Bayesian models.

Usage

loo(x, ...)
"loo"(x, ...)
"loo"(x, ..., args)

Arguments

x
A log-likelihood matrix or function. See the Methods (by class) section below for a detailed description.
...
Optional arguments to pass to psislw. Possible arguments and their defaults are:

We recommend using the default values for the psislw arguments unless there are problems (e.g. NA or NaN results).

args
Only required if x is a function. A list containing the data required to specify the arguments to the function. See the Methods (by class) section below for how args should be specified.

Value

A named list with class 'loo' and components:

Methods (by class)

  • matrix: An $S$ by $N$ matrix, where $S$ is the size of the posterior sample (the number of simulations) and $N$ is the number of data points. Typically (but not restricted to be) the object returned by extract_log_lik.
  • function: A function $f$ that takes arguments i, data, and draws and returns a vector containing the log-likelihood for the ith observation evaluated at each posterior draw. The args argument must also be specified and should be a named list with the following components:
    • draws: An object containing the posterior draws for any parameters needed to compute the pointwise log-likelihood.
    • data: An object containing data (e.g. observed outcome and predictors) needed to compute the pointwise log-likelihood. data should be in an appropriate form so that $f$(i=i, data=data[i,,drop=FALSE], draws=draws) returns the S-vector of log-likelihoods for the ith observation.
    • N: The number of observations.
    • S: The size of the posterior sample.

Details

PSIS-LOO We approximate LOO using Pareto Smoothed Importance Sampling (PSIS). See loo-package for details.

Memory For models fit to very large datasets we recommend the loo.function method, which is much more memory efficient than the loo.matrix method. However, the loo.matrix method is typically more convenient, so it is usually worth trying loo.matrix and then switching to loo.function if memory is an issue.

References

Vehtari, A., Gelman, A., and Gabry, J. (2016a). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing. Advance online publication. doi:10.1007/s11222-016-9696-4. arXiv preprint: http://arxiv.org/abs/1507.04544/

Vehtari, A., Gelman, A., and Gabry, J. (2016b). Pareto smoothed importance sampling. arXiv preprint: http://arxiv.org/abs/1507.02646/

See Also

loo-package for details on the Pareto Smoothed Importance Sampling (PSIS) procedure used for approximating LOO.

compare for model comparison.

pareto-k-diagnostic for convenience functions for looking at diagnostics.

print.loo for the print method for 'loo' objects.

Examples

Run this code
## Not run: 
# ### Usage with stanfit objects
# log_lik1 <- extract_log_lik(stanfit1) # see ?extract_log_lik
# loo1 <- loo(log_lik1)
# print(loo1, digits = 3)
# 
# log_lik2 <- extract_log_lik(stanfit2)
# (loo2 <- loo(log_lik2))
# compare(loo1, loo2)
# ## End(Not run)

### Using log-likelihood function instead of matrix
set.seed(024)

# Simulate data and draw from posterior
N <- 50; K <- 10; S <- 100; a0 <- 3; b0 <- 2
p <- rbeta(1, a0, b0)
y <- rbinom(N, size = K, prob = p)
a <- a0 + sum(y); b <- b0 + N * K - sum(y)
draws <- rbeta(S, a, b)
data <- data.frame(y,K)

llfun <- function(i, data, draws) {
  dbinom(data$y, size = data$K, prob = draws, log = TRUE)
}
loo_with_fn <- loo(llfun, args = nlist(data, draws, N, S), cores = 1)

# Check that we get same answer if using log-likelihood matrix
log_lik_mat <- sapply(1:N, function(i) llfun(i, data[i,, drop=FALSE], draws))
loo_with_mat <- loo(log_lik_mat, cores = 1)
all.equal(loo_with_mat, loo_with_fn)

Run the code above in your browser using DataLab