An Implementation of the procedure proposed in Danielsson et al. (2016) for selecting the optimal threshold in extreme value analysis.
mindist(data, ts = 0.15, method = "mad")
vector of sample data
size of the upper tail the procedure is applied to. Default is 15 percent of the data
should be one of ks
for the "Kolmogorov-Smirnov" distance metric or mad
for the mean absolute deviation (default)
optimal number of upper order statistics, i.e. number of exceedances or data in the tail
the corresponding threshold
the corresponding tail index by plugging in k0
into the hill estimator
The procedure proposed in Danielsson et al. (2016) minimizes the distance between the largest upper order statistics of the dataset, i.e. the empirical tail, and the theoretical tail of a Pareto distribution. The parameter of this distribution are estimated using Hill's estimator. Therefor one needs the optimal number of upper order statistics k
. The distance is then minimized with respect to this k
. The optimal number, denoted k0
here, is equivalent to the number of extreme values or, if you wish, the number of exceedances in the context of a POT-model like the generalized Pareto distribution. k0
can then be associated with the unknown threshold u
of the GPD by saying u
is the n-k0
th upper order statistic. For the distance metric in use one could choose the mean absolute deviation called mad
here, or the maximum absolute deviation, also known as the "Kolmogorov-Smirnov" distance metric (ks
). For more information see references.
Danielsson, J. and Ergun, L.M. and de Haan, L. and de Vries, C.G. (2016). Tail Index Estimation: Quantile Driven Threshold Selection.
# NOT RUN {
data(danish)
mindist(danish,method="mad")
# }
Run the code above in your browser using DataLab