pot_estimator: Peaks-over-threshold (POT) Estimator

Description

This function chooses the $\hat{\xi}_{k}$ and $\hat \beta$ that minimize the negative log likelihood of the Generalized Pareto Distribution (GPD).

Usage

pot_estimator(data, u, start_xi = 0.1, start_beta = NULL, na.rm = FALSE)

Value

An unnamed numeric vector of length 2 containing the estimated Generalized Pareto Distribution (GPD) parameters that minimize the negative log likelihood: $\xi$ (shape/tail index) and $\beta$ (scale parameter).

Arguments

data: A numeric vector of i.i.d. observations.
u: A numeric scalar that specifies the threshold value to calculate excesses
start_xi: Initial value of $\xi$ to pass to the optimizer
start_beta: Initial value of $\beta$ to pass to the optimizer
na.rm: Logical. If TRUE, missing values (NA) are removed before analysis. Defaults to FALSE.

Details

The PDF of a excess data point $x_i$ is given by:

$$f(x_i;\xi, \beta) = \frac{1}{\beta} \left(1 + \xi \frac{x_i}{\beta}\right)^{-\left(\frac{1}{\xi} + 1\right)}$$

If we apply $log$ to the above equation we get:

$$l(x_i;\xi, \beta)=-\log(\beta) - (\frac{1}{\xi} + 1) \log(1 + \xi \frac{x_i}{\beta})$$

For all excess data points $n$:

$$l(\xi,\beta)=\sum_{i=1}^{n} (-\log(\beta) - (\frac{1}{\xi} + 1) \log(1 + \xi \frac{x_i}{\beta}))$$

$$l(\xi,\beta)=-n\log(\beta) - (\frac{1}{\xi} + 1)\sum_{i=1}^{n} \log(1 + \xi \frac{x_i}{\beta})$$

We can thus minimize $-l(\xi,\beta)$. The parameters $\xi$ and $\beta$ that minimize the negative log likelihood are the same that maximize the log likelihood. Hence, by using the excesses, we are able to determine $\xi$ and $\beta$ that best fit the tail of the data.

There is also the case to consider when $\xi = 0$ which results in an exponential distribution. The total log likelihood in such a case is:

$$l(0, \beta) = -n \log(\beta) - \frac{1}{\beta} \sum_{i=1}^{n} x_i$$

References

Davison, A. C., & Smith, R. L. (1990). Models for exceedances over high thresholds. Journal of the Royal Statistical Society: Series B (Methodological), 52(3), 393-425. tools:::Rd_expr_doi("10.1111/j.2517-6161.1990.tb01796.x")

Balkema, A. A., & de Haan, L. (1974). Residual life time at great age. The Annals of Probability, 2(5), 792-804. tools:::Rd_expr_doi("10.1214/aop/1176996548")

Pickands, J. (1975). Statistical Inference Using Extreme Order Statistics. The Annals of Statistics, 3(1), 119–131. http://www.jstor.org/stable/2958083

Nair, J., Wierman, A., & Zwart, B. (2022). The Fundamentals of Heavy Tails: Properties, Emergence, and Estimation. Cambridge University Press. (pp. 221-226) tools:::Rd_expr_doi("10.1017/9781009053730")

Examples

Run this code

x <- rweibull(n=800, shape = 0.8, scale = 1)
values <- pot_estimator(data = x, u = 2, start_xi = 0.1, start_beta = NULL)

Run the code above in your browser using DataLab