hillplot: Hill Plot

Description

Plots the Hill plot and some its variants.

Usage

hillplot(data, orderlim = NULL, tlim = NULL, hill.type = "Hill",
  r = 2, x.theta = FALSE, y.alpha = FALSE, alpha = 0.05,
  ylim = NULL, legend.loc = "topright",
  try.thresh = quantile(data[data > 0], 0.9, na.rm = TRUE),
  main = paste(ifelse(x.theta, "Alt", ""), hill.type, " Plot", sep = ""),
  xlab = ifelse(x.theta, "theta", "order"),
  ylab = paste(ifelse(x.theta, "Alt", ""), hill.type, ifelse(y.alpha,
  " alpha", " xi"), ">0", sep = ""), ...)

Arguments

data

vector of sample data

orderlim

vector of (lower, upper) limits of order statistics to plot estimator, or NULL to use default values

tlim

vector of (lower, upper) limits of range of threshold to plot estimator, or NULL to use default values

hill.type

"Hill" or "SmooHill"

smoothing factor for "SmooHill" (integer > 1)

x.theta

logical, should order (FALSE) or theta (TRUE) be given on x-axis

y.alpha

logical, should shape xi (FALSE) or tail index alpha (TRUE) be given on y-axis

alpha

significance level over range (0, 1), or NULL for no CI

ylim

y-axis limits or NULL

legend.loc

location of legend (see legend) or NULL for no legend

try.thresh

vector of thresholds to consider

main

title of plot

xlab

x-axis label

ylab

y-axis label

...

further arguments to be passed to the plotting functions

Value

hillplot gives the Hill plot. It also returns a dataframe containing columns of the order statistics, order, Hill estimator, it's standard devation and $100(1 - \alpha)\%$ confidence interval (when requested). When the SmooHill plot is selected, then the corresponding SmooHill estimates are appended.

Acknowledgments

Thanks to Younes Mouatasim, Risk Dynamics, Brussels for reporting various bugs in these functions.

Details

Produces the Hill, AltHill, SmooHill and AltSmooHill plots, including confidence intervals.

For an ordered iid sequence $X_{(1)}\ge X_{(2)}\ge\cdots\ge X_{(n)} > 0$ the Hill (1975) estimator using $k$ order statistics is given by $$H_{k,n}=\frac{1}{k}\sum_{i=1}^{k} \log(\frac{X_{(i)}}{X_{(k+1)}})$$ which is the pseudo-likelihood estimator of reciprocal of the tail index $\xi=/\alpha>0$ for regularly varying tails (e.g. Pareto distribution). The Hill estimator is defined on orders $k>2$, as when$k=1$ the $$H_{1,n}=0$$. The function will calculate the Hill estimator for $k\ge 1$. The simple Hill plot is shown for hill.type="Hill".

Once a sufficiently low order statistic is reached the Hill estimator will be constant, upto sample uncertainty, for regularly varying tails. The Hill plot is a plot of $$H_{k,n}$$ against the $k$. Symmetric asymptotic normal confidence intervals assuming Pareto tails are provided.

These so called Hill's horror plots can be difficult to interpret. A smooth form of the Hill estimator was suggested by Resnick and Starica (1997): $$smooH_{k,n}=\frac{1}{(r-1)k}\sum_{j=k+1}^{rk} H_{j,n}$$ giving the smooHill plot which is shown for hill.type="SmooHill". The smoothing factor is r=2 by default.

It has also been suggested to plot the order on a log scale, by plotting the points $(\theta, H_{\lceil n^\theta\rceil, n})$ for $0\le \theta \le 1$. This gives the so called AltHill and AltSmooHill plots. The alternative x-axis scale is chosen by x.theta=TRUE.

The Hill estimator is for the GPD shape $\xi>0$, or the reciprocal of the tail index $\alpha=1/\xi>0$. The shape is plotted by default using y.alpha=FALSE and the tail index is plotted when y.alpha=TRUE.

A pre-chosen threshold (or more than one) can be given in try.thresh. The estimated parameter ($\xi$ or $\alpha$) at each threshold are plot by a horizontal solid line for all higher thresholds. The threshold should be set as low as possible, so a dashed line is shown below the pre-chosen threshold. If the Hill estimator is similar to the dashed line then a lower threshold may be chosen.

If no order statistic (or threshold) limits are provided orderlim = tlim = NULL then the lowest order statistic is set to $X_{(3)}$ and highest possible value $X_{(n-1)}$. However, the Hill estimator is always output for all $k=1, \ldots, n-1$ and $k=1, \ldots, floor(n/k)$ for smooHill estimator.

The missing (NA and NaN) and non-finite values are ignored. Non-positive data are ignored.

The lower x-axis is the order $k$ or $\theta$, chosen by the option x.theta=FALSE and x.theta=TRUE respectively. The upper axis is for the corresponding threshold.

References

Hill, B.M. (1975). A simple general approach to inference about the tail of a distribution. Annals of Statistics 13, 331-341.

Resnick, S. and Starica, C. (1997). Smoothing the Hill estimator. Advances in Applied Probability 29, 271-293.

Resnick, S. (1997). Discussion of the Danish Data of Large Fire Insurance Losses. Astin Bulletin 27, 139-151.

Examples

Run this code

# NOT RUN {
# Reproduce graphs from Figure 2.4 of Resnick (1997)
data(danish, package="evir")
par(mfrow = c(2, 2))

# Hill plot
hillplot(danish, y.alpha=TRUE, ylim=c(1.1, 2))

# AltHill plot
hillplot(danish, y.alpha=TRUE, x.theta=TRUE, ylim=c(1.1, 2))

# AltSmooHill plot
hillplot(danish, hill.type="SmooHill", r=3, y.alpha=TRUE, x.theta=TRUE, ylim=c(1.35, 1.85))

# AltHill and AltSmooHill plot (no CI's or legend)
hillout = hillplot(danish, hill.type="SmooHill", r=3, y.alpha=TRUE, 
 x.theta=TRUE, try.thresh = c(), alpha=NULL, ylim=c(1.1, 2), legend.loc=NULL, lty=2)
n = length(danish)
with(hillout[3:n,], lines(log(ks)/log(n), 1/H, type="s"))
# }

Run the code above in your browser using DataLab