kgaps_stat: Sufficient statistics for the $K$-gaps model

Description

Calculates sufficient statistics for the $K$-gaps model for the extremal index $\theta$. Called by kgaps.

Usage

kgaps_stat(data, u, q_u, k = 1, inc_cens = TRUE)

Value

A list containing the sufficient statistics, with components

N0: the number of zero $K$-gaps.
N1: contribution from non-zero $K$-gaps (see Details).
sum_qs: the sum of the (scaled) $K$-gaps, that is, $q (S_0 + \cdots + S_N)$, where $q$ is estimated by the proportion of threshold exceedances.
n_kgaps: the number of $K$-gaps that contribute to the log-likelihood.

Arguments

data: A numeric vector of raw data.
u: A numeric scalar. Extreme value threshold applied to data.
q_u: A numeric scalar. An estimate of the probability with which the threshold u is exceeded. If q_u is missing then it is calculated using mean(data > u, na.rm = TRUE).
k: A numeric scalar. Run parameter $K$, as defined in Suveges and Davison (2010). Threshold inter-exceedances times that are not larger than k units are assigned to the same cluster, resulting in a $K$-gap equal to zero. Specifically, the $K$-gap $S$ corresponding to an inter-exceedance time of $T$ is given by $S = \max(T - K, 0)$.
inc_cens: A logical scalar indicating whether or not to include contributions from right-censored inter-exceedance times relating to the first and last observation. It is known that these times are greater than or equal to the time observed. See Attalides (2015) for details.

Details

The sample $K$-gaps are $S_0, S_1, ..., S_{N-1}, S_N$, where $S_1, ..., S_{N-1}$ are uncensored and $S_0$ and $S_N$ are right-censored. Under the assumption that the $K$-gaps are independent, the log-likelihood of the $K$-gaps model is given by $$l(\theta; S_0, \ldots, S_N) = N_0 \log(1 - \theta) + 2 N_1 \log \theta - \theta q (S_0 + \cdots + S_N),$$ where

$q$ is the threshold exceedance probability, estimated by the proportion of threshold exceedances,
$N_0$ is the number of uncensored sample $K$-gaps that are equal to zero,
(apart from an adjustment for the contributions of $S_0$ and $S_N$) $N_1$ is the number of positive sample $K$-gaps,
specifically, if inc_cens = TRUE then $N_1$ is equal to the number of $S_1, ..., S_{N-1}$ that are positive plus $(I_0 + I_N) / 2$, where $I_0 = 1$ if $S_0$ is greater than zero and $I_0 = 0$ otherwise, and similarly for $I_N$.

The differing treatment of uncensored and right-censored $K$-gaps reflects differing contributions to the likelihood. Right-censored $K$-gaps that are equal to zero add no information to the likelihood. For full details see Suveges and Davison (2010) and Attalides (2015).

If $N_1 = 0$ then we are in the degenerate case where there is one cluster (all $K$-gaps are zero) and the likelihood is maximized at $\theta = 0$.

If $N_0 = 0$ then all exceedances occur singly (all $K$-gaps are positive) and the likelihood is maximized at $\theta = 1$.

References

Suveges, M. and Davison, A. C. (2010) Model misspecification in peaks over threshold analysis, Annals of Applied Statistics, 4(1), 203-221. tools:::Rd_expr_doi("10.1214/09-AOAS292")

Attalides, N. (2015) Threshold-based extreme value modelling, PhD thesis, University College London. https://discovery.ucl.ac.uk/1471121/1/Nicolas_Attalides_Thesis.pdf

Examples

Run this code

u <- quantile(newlyn, probs = 0.90)
kgaps_stat(newlyn, u)

Run the code above in your browser using DataLab

kgaps_stat: Sufficient statistics for the \(K\)-gaps model