An implementation of the sequential testing procedure proposed in Thompson et al. (2009) for automated threshold selection
TH(data, thresholds)
vector of sample data
a sequence of pre-defined thresholds to check for GPD assumption
the threshold used for the test
the number of observations above the given threshold
raw p-values for the thresholds tested
transformed p-values according to the ForwardStop criterion. See G'Sell et al (2016) for more information
transformed p-values according to the StrongStop criterion. See G'Sell et al (2016) for more information
estimated scale parameter for the given threshold
estimated shape parameter for the given threshold
The procedure proposed in Thompson et al. (2009) is based on sequential goodness of fit testing. First, one has to choose a equally spaced grid of posssible thresholds. The authors recommend 100 thresholds between the 50 percent and 98 percent quantile of the data, provided there are enough observations left (about 100 observations above the last pre-defined threshold). Then the parameters of a GPD for each threshold are estimated. One can show that the differences of subsequent scale parameters are approximately normal distributed. So a Pearson chi-squared test for normality is applied to all the differences, striking the smallest thresholds out until the test is not rejected anymore.
Thompson, P. and Cai, Y. and Reeve, D. (2009). Automated threshold selection methods for extreme wave analysis. Coastal Engineering, 56(10), 1013--1021.
G'Sell, M.G. and Wager, S. and Chouldechova, A. and Tibshirani, R. (2016). Sequential selection procedures and false discovery rate control. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 78(2), 423--444.
# NOT RUN {
data=rexp(1000)
u=seq(quantile(data,.1),quantile(data,.9),,100)
A=TH(data,u);A
# }
Run the code above in your browser using DataLab