psislw: Pareto smoothed importance sampling (PSIS)

Description

Implementation of Pareto smoothed importance sampling, a method for stabilizing importance weights. For full details about the algorithm see Vehtari, Gelman and Gabry (2016a, 2016b). For diagnostics see the pareto-k-diagnostic page.

Usage

psislw(lw, wcp = 0.2, wtrunc = 3/4, cores = getOption("loo.cores",
  parallel::detectCores()), llfun = NULL, llargs = NULL, ...)

Arguments

A matrix or vector of log weights. For computing LOO, lw = -log_lik, the negative of an $S$ (simulations) by $N$ (data points) pointwise log-likelihood matrix.

wcp

The proportion of importance weights to use for the generalized Pareto fit. The 100*wcp% largest weights are used as the sample from which to estimate the parameters of the generalized Pareto distribution.

wtrunc

For truncating very large weights to $S$^wtrunc. Set to zero for no truncation.

cores

The number of cores to use for parallelization. This can be set for an entire R session by options(loo.cores = NUMBER). The default is detectCores().

llfun, llargs

See loo.function.

...

Ignored when psislw is called directly. The ... is only used internally when psislw is called by the loo function.

Value

A named list with components lw_smooth (modified log weights) and pareto_k (estimated generalized Pareto shape parameter(s) k).

PSIS-LOO

The distribution of the importance weights used in LOO may have a long right tail. We use the empirical Bayes estimate of Zhang and Stephens (2009) to fit a generalized Pareto distribution (gPd) to the tail (20% largest importance ratios). By examining the shape parameter $k$ of the fitted gPd, we are able to obtain sample based estimates of the existance of the moments (Koopman et al, 2009). This extends the diagnostic approach of Peruggia (1997) and Epifani et al. (2008) to be used routinely with importance-sampling LOO for any model with a factorizing likelihood.

Epifani et al. (2008) show that when estimating the leave-one-out predictive density, the central limit theorem holds if the variance of the weight distribution is finite. These results can be extended using the generalized central limit theorem for stable distributions. Thus, even if the variance of the importance weight distribution is infinite, if the mean exists the estimate's accuracy improves when additional draws are obtained. When the tail of the weight distribution is long, a direct use of importance sampling is sensitive to one or few largest values. By fitting a gPd to the upper tail of the importance weights we smooth these values. The procedure (implemented in the psislw function) goes as follows:

Fit the gPd to the 20% largest importance ratios $r_s$. The computation is done separately for each held-out data point $i$. In simulation experiments with thousands and tens of thousands of draws, we have found that the fit is not sensitive to the specific cutoff value (for a consistent estimation the proportion of the samples above the cutoff should get smaller when the number of draws increases).
Stabilize the importance ratios by replacing the $M$ largest ratios by the expected values of the order statistics of the fitted gPd $$G((z - 0.5)/M), z = 1,...,M,$$ where $M$ is the number of simulation draws used to fit the Pareto (in this case, $M = 0.2*S$) and $G$ is the inverse-CDF of the gPd.
To guarantee finite variance of the estimate, truncate the smoothed ratios with $$S^{3/4}\bar{w},$$ where $\bar{w}$ is the average of the smoothed weights.

The above steps must be performed for each data point $i$. The result is a vector of weights $w_{i}^{s}, s = 1,...,S$, for each $i$, which in general should be better behaved than the raw importance ratios $r_{i}^{s}$ from which they were constructed.

The results can be then combined to compute the desired LOO estimates.

References

Vehtari, A., Gelman, A., and Gabry, J. (2016a). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing. Advance online publication. doi:10.1007/s11222-016-9696-4. (published version, arXiv preprint).