Learn R Programming

heavytails (version 0.1.1)

pot_estimator: Peaks-over-threshold (POT) Estimator

Description

This function chooses the \(\hat{\xi}_{k}\) and \(\hat \beta\) that minimize the negative log likelihood of the Generalized Pareto Distribution (GPD).

Usage

pot_estimator(data, u, start_xi = 0.1, start_beta = NULL, na.rm = FALSE)

Value

An unnamed numeric vector of length 2 containing the estimated Generalized Pareto Distribution (GPD) parameters that minimize the negative log likelihood: \(\xi\) (shape/tail index) and \(\beta\) (scale parameter).

Arguments

data

A numeric vector of i.i.d. observations.

u

A numeric scalar that specifies the threshold value to calculate excesses

start_xi

Initial value of \(\xi\) to pass to the optimizer

start_beta

Initial value of \(\beta\) to pass to the optimizer

na.rm

Logical. If TRUE, missing values (NA) are removed before analysis. Defaults to FALSE.

Details

The PDF of a excess data point \(x_i\) is given by:

$$f(x_i;\xi, \beta) = \frac{1}{\beta} \left(1 + \xi \frac{x_i}{\beta}\right)^{-\left(\frac{1}{\xi} + 1\right)}$$

If we apply \(log\) to the above equation we get:

$$l(x_i;\xi, \beta)=-\log(\beta) - (\frac{1}{\xi} + 1) \log(1 + \xi \frac{x_i}{\beta})$$

For all excess data points \(n\):

$$l(\xi,\beta)=\sum_{i=1}^{n} (-\log(\beta) - (\frac{1}{\xi} + 1) \log(1 + \xi \frac{x_i}{\beta}))$$

$$l(\xi,\beta)=-n\log(\beta) - (\frac{1}{\xi} + 1)\sum_{i=1}^{n} \log(1 + \xi \frac{x_i}{\beta})$$

We can thus minimize \(-l(\xi,\beta)\). The parameters \(\xi\) and \(\beta\) that minimize the negative log likelihood are the same that maximize the log likelihood. Hence, by using the excesses, we are able to determine \(\xi\) and \(\beta\) that best fit the tail of the data.

There is also the case to consider when \(\xi = 0\) which results in an exponential distribution. The total log likelihood in such a case is:

$$l(0, \beta) = -n \log(\beta) - \frac{1}{\beta} \sum_{i=1}^{n} x_i$$

References

Davison, A. C., & Smith, R. L. (1990). Models for exceedances over high thresholds. Journal of the Royal Statistical Society: Series B (Methodological), 52(3), 393-425. tools:::Rd_expr_doi("10.1111/j.2517-6161.1990.tb01796.x")

Balkema, A. A., & de Haan, L. (1974). Residual life time at great age. The Annals of Probability, 2(5), 792-804. tools:::Rd_expr_doi("10.1214/aop/1176996548")

Pickands, J. (1975). Statistical Inference Using Extreme Order Statistics. The Annals of Statistics, 3(1), 119–131. http://www.jstor.org/stable/2958083

Nair, J., Wierman, A., & Zwart, B. (2022). The Fundamentals of Heavy Tails: Properties, Emergence, and Estimation. Cambridge University Press. (pp. 221-226) tools:::Rd_expr_doi("10.1017/9781009053730")

Examples

Run this code
x <- rweibull(n=800, shape = 0.8, scale = 1)
values <- pot_estimator(data = x, u = 2, start_xi = 0.1, start_beta = NULL)

Run the code above in your browser using DataLab