Print a diagnostic table summarizing the estimated Pareto shape parameters,
find the indexes of observations for which the estimated Pareto shape
parameter \(k\) is larger than some threshold
value, or plot
observation indexes vs \(k\) estimates.
pareto_k_table(x)pareto_k_ids(x, threshold = 0.5)
# S3 method for loo
plot(x, ..., label_points = FALSE)
The threshold value for \(k\).
For the plot
method, if label_points
is
TRUE
the observation numbers corresponding to any values of \(k\)
greater than 0.5 will be displayed in the plot. Any arguments specified in
...
will be passed to text
and can be used
to control the appearance of the labels.
pareto_k_table
returns an object of class
"pareto_k_table"
, which is a matrix with columns "Count"
and
"Proportion"
and has its own print method.
pareto_k_ids
returns an integer vector indicating which
observations have Pareto \(k\) estimates above threshold
.
The plot
method is called for its side effect and does not
return anything. If x
is the result of a call to loo
,
plot(x)
produces a plot of the estimates of the Pareto shape
parameter \(k\). There is no plot
method for objects generated by
a call to waic
.
The reliability of the PSIS-based estimates can be assessed using the estimates for the shape parameter \(k\) of the generalized Pareto distribution.
If \(k < 1/2\) the variance of the raw importance ratios is finite, the central limit theorem holds, and the estimate converges quickly.
If \(k\) is between 1/2 and 1 the variance of the raw importance ratios is infinite but the mean exists, the generalized central limit theorem for stable distributions holds, and the convergence of the estimate is slower. The variance of the PSIS estimate is finite but may be large.
If \(k > 1\) the variance and the mean of the raw ratios distribution do not exist. The variance of the PSIS estimate is finite but may be large.
If the estimated tail shape parameter \(k\) exceeds \(0.5\), the user should be warned, although in practice we have observed good performance for values of \(k\) up to 0.7. Even if the PSIS estimate has a finite variance, the user should consider sampling directly from \(p(\theta^s | y_{-i})\) for the problematic \(i\), use \(k\)-fold cross-validation, or use a more robust model.
Importance sampling is likely to work less well if the marginal posterior \(p(\theta^s | y)\) and LOO posterior \(p(\theta^s | y_{-i})\) are much different, which is more likely to happen with a non-robust model and highly influential observations. A robust model may reduce the sensitivity to highly influential observations.
Vehtari, A., Gelman, A., and Gabry, J. (2016a). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing. Advance online publication. doi:10.1007/s11222-016-9696-4. (published version, arXiv preprint).
Vehtari, A., Gelman, A., and Gabry, J. (2016b). Pareto smoothed importance sampling. arXiv preprint: http://arxiv.org/abs/1507.02646/
psislw
for the implementation of the PSIS algorithm.