pareto_pit: Pareto-smoothed probability integral transform

Description

Compute PIT values using the empirical CDF, then refine values in the tails by fitting a generalized Pareto distribution (GPD) to the tail draws. This gives smoother, more accurate PIT values in the tails where the ECDF is coarse, and avoids PIT values of 0 and 1. Due to use of generalized Pareto distribution CDF in tails, the PIT values are not anymore rank based and continuous uniformity test is appropriate.

Usage

pareto_pit(x, y, ...)
# S3 method for default
pareto_pit(x, y, weights = NULL, log = FALSE, ndraws_tail = NULL, ...)
# S3 method for draws_matrix
pareto_pit(x, y, weights = NULL, log = FALSE, ndraws_tail = NULL, ...)
# S3 method for rvar
pareto_pit(x, y, weights = NULL, log = FALSE, ndraws_tail = NULL, ...)

Value

A numeric vector of length length(y) containing the PIT values, or an array of shape dim(y), if x is an rvar.

Arguments

x: (draws) A draws_matrix object or one coercible to a draws_matrix object, or an rvar object.
y: (observations) A 1D vector, or an array of dim(x), if x is rvar. Each element of y corresponds to a variable in x.
...: Arguments passed to individual methods (if applicable).
weights: A matrix of weights for each draw and variable. weights should have one column per variable in x, and ndraws(x) rows.
log: (logical) Are the weights passed already on the log scale? The default is FALSE, that is, expecting weights to be on the standard (non-log) scale.
ndraws_tail: (integer) Number of tail draws to use for GPD fitting. If NULL (the default), computed using ps_tail_length().

Details

The function first computes raw PIT values identically to pit() (including support for weighted draws). It then fits a GPD to both tails of the draws (using the same approach as pareto_smooth()) and replaces PIT values for observations falling in the tail regions:

For a right-tail observation $y_i > c_R$ (where $c_R$ is the right-tail cutoff):

$$PIT(y_i) = 1 - p_{tail}(1 - F_{GPD}(y_i; c_R, \sigma_R, k_R))$$

For a left-tail observation $y_i < c_L$:

$$PIT(y_i) = p_{tail}(1 - F_{GPD}(-y_i; -c_L, \sigma_L, k_L))$$

where $p_{tail}$ is the proportion of (weighted) mass in the tail.

When (log-)weights in weights are provided, they are used for the raw PIT computation (as in pit()) and for GPD fit.

Examples

Run this code

x <- example_draws()
y <- rnorm(nvariables(x), 5, 5)
pareto_pit(x, y)