loo.stanfit: Approximate leave-one-out cross-validation

Description

A loo method that is customized for stanfit objects. The loo method for stanfit objects ---a wrapper around the loo.array (loo package)--- computes PSIS-LOO CV, approximate leave-one-out cross-validation using Pareto smoothed importance sampling (Vehtari, Gelman, and Gabry, 2017a,2017b).

Usage

# S3 method for stanfit
loo(x, pars = "log_lik", 
    …, save_psis = FALSE,
    cores = getOption("mc.cores", 1))

Arguments

An object of S4 class stanfit.

pars

Name of transformed parameter or generated quantity in the Stan program corresponding to the pointwise log-likelihood. If not specified the default behavior is to look for "log_lik".

cores

Number of cores to use for parallelization. The default is 1 unless cores is specified or the mc.cores option has been set.

save_psis

Should the intermediate results from psis be saved in the returned object? The default is FALSE. This can be useful to avoid repeated computation when using other functions in the loo and bayesplot packages.

…

Ignored.

Value

A list with class c("psis_loo", "loo"), as detailed in the loo.array documentation.

Details

Stan does not automatically compute and store the log-likelihood. It is up to the user to incorporate it into the Stan program if it is to be extracted after fitting the model. In a Stan program, the pointwise log likelihood can be coded as a vector in the transformed parameters block (and then summed up in the model block) or it can be coded entirely in the generated quantities block. We recommend using the generated quantities block so that the computations are carried out only once per iteration rather than once per HMC leapfrog step.

For example, the following is the generated quantities block for computing and saving the log-likelihood for a linear regression model with N data points, outcome y, predictor matrix X (including column of 1s for intercept), coefficients beta, and standard deviation sigma:

vector[N] log_lik;

for (n in 1:N) log_lik[n] = normal_lpdf(y[n] | X[n, ] * beta, sigma);

References

Vehtari, A., Gelman, A., and Gabry, J. (2017a). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing. 27(5), 1413-1432. doi:10.1007/s11222-016-9696-4. http://arxiv.org/abs/1507.04544, http://link.springer.com/article/10.1007%2Fs11222-016-9696-4

Vehtari, A., Gelman, A., and Gabry, J. (2017b). Pareto smoothed importance sampling. arXiv preprint: http://arxiv.org/abs/1507.02646/

Yao, Y., Vehtari, A., Simpson, D., and Gelman, A. (2018). Using stacking to average Bayesian predictive distributions. Bayesian Analysis, advance publication, doi:10.1214/17-BA1091. https://projecteuclid.org/euclid.ba/1516093227.

Examples

Run this code

# NOT RUN {
# Generate a dataset from N(0,1)
N <- 100
y <- rnorm(N, 0, 1)

# Suppose we have three models for y: 
#  1) y ~ N(-1, sigma) 
#  2) y ~ N(0.5, sigma)
#  3) y ~ N(0.6,sigma)
# 
stan_code <- "
data {
  int N;
  vector[N] y;
  real mu_fixed;
}
  parameters {
  real<lower=0> sigma;
}
model { 
  sigma ~ exponential(1);
  y ~ normal(mu_fixed, sigma);
}
generated quantities {
  vector[N] log_lik;
  for (n in 1:N) log_lik[n] = normal_lpdf(y[n]| mu_fixed, sigma);
}"

mod <- stan_model(model_code = stan_code)
fit1 <- sampling(mod, data=list(N=N, y=y, mu_fixed=-1))
fit2 <- sampling(mod, data=list(N=N, y=y, mu_fixed=0.5))
fit3 <- sampling(mod, data=list(N=N, y=y, mu_fixed=0.6))

# use the loo method for stanfit objects
loo1 <- loo(fit1, pars = "log_lik")
print(loo1)

# which is equivalent to 
LL <- as.array(fit1, pars = "log_lik")
r_eff <- loo::relative_eff(exp(LL))
loo1b <- loo::loo.array(LL, r_eff = r_eff)
print(loo1b)

# compute loo for the other models
loo2 <- loo(fit2)
loo3 <- loo(fit3)

# stacking weights
wts <- loo::loo_model_weights(list(loo1, loo2, loo3), method = "stacking")
print(wts)
# }

Run the code above in your browser using DataLab