The information matrix test (IMT), proposed by Suveges and Davison (2010), is based
on the difference between the expected quadratic score and the second derivative of
the log-likelihood. The asymptotic distribution for each threshold u and gap K
is asymptotically \(\chi^2\) with one degree of freedom. The approximation is good for
\(N>80\) and conservative for smaller sample sizes. The test assumes independence between gaps.
thselect.sdinfo(xdat, thresh, qlev, plot = FALSE, kmax = 1, k = 1)an invisible list of class with elements
thresh a vector of thresholds based on empirical quantiles at supplied levels.
stat a matrix of test statistics
pval a matrix of approximate p-values (corresponding to probabilities under a \(\chi^2_1\) distribution)
mle a matrix of maximum likelihood estimates for each given pair of thresholds and gaps
loglik a matrix of log-likelihood values at MLE for each given pair of elements in thresh and gap in \(0, \ldots,\code{kmax}\)
quantile quantile levels for thresholds, if supplied by the user
kmax the largest gap number
[vector] vector of observations
[vector] candidate thresholds
[vector] probability levels to define threshold if thresh is missing.
[logical]; should the graphical diagnostic be plotted?
[int] the largest K-gap under consideration for clusters
[int] the K-gap for automatic threshold selection
Leo Belzile
The procedure proposed in Suveges & Davison (2010) was corrected for erratas. The maximum likelihood is based on the limiting mixture distribution of the intervals between exceedances (an exponential with a point mass at zero). The condition \(D^{(K)}(u_n)\) should be checked by the user.
Fukutome et al. (2015) propose an ad hoc automated procedure
Calculate the interexceedance times for each K-gap and each threshold, along with the number of clusters
Select the (u, K) pairs for which IMT < 0.05 (corresponding to a P-value of 0.82)
Among those, select the pair (u, K) for which the number of clusters is the largest
Fukutome, Liniger and Suveges (2015), Automatic threshold and run parameter selection: a climatology for extreme hourly precipitation in Switzerland. Theoretical and Applied Climatology, 120(3), 403-416.
Suveges and Davison (2010), Model misspecification in peaks over threshold analysis. Annals of Applied Statistics, 4(1), 203-221.
White (1982), Maximum Likelihood Estimation of Misspecified Models. Econometrica, 50(1), 1-25.
thselect.sdinfo(
xdat = rgp(n = 10000),
qlev = seq(0.1, 0.9, length = 10),
kmax = 3)
Run the code above in your browser using DataLab