gpd: Generalized Pareto Distribution Family Function

Description

Maximum likelihood estimation of the 2-parameter generalized Pareto distribution (GPD).

Usage

gpd(threshold = 0, lscale = "loge", lshape = "logoff",
    escale = list(),
    eshape = if(lshape=="logoff") list(offset=0.5) else
                if(lshape=="elogit") list(min=-0.5, max=0.5) else NULL,
    percentiles = c(90, 95), iscale = NULL, ishape = NULL,
    tshape0=0.001, method.init=1, zero = 2)

Arguments

threshold

Numeric of length 1. The threshold value. Only values of the response which are greater than this value are kept. The response actually worked on internally is the difference. Only those observations greater than the threshold value are returned

lscale

Parameter link function for the scale parameter $\sigma$. See Links for more choices.

lshape

Parameter link function for the shape parameter $\xi$. See Links for more choices. The default constrains the parameter to be greater than $-0.5$ (the negative of Offset). This is because

escale, eshape

Extra argument for the lscale and lshape arguments. See earg in Links for general information. For the shape parameter, if the

percentiles

Numeric vector of percentiles used for the fitted values. Values should be between 0 and 100. See the example below for illustration. However, if percentiles=NULL then the mean $\mu + \sigma / (1-\xi)$ is returned; this is only defi

iscale, ishape

Numeric. Optional initial values for $\sigma$ and $\xi$. The default is to use method.init and compute a value internally for each parameter. Values of ishape should be between $-0.5$ and $1$. Values of iscale

tshape0

Positive numeric. Threshold/tolerance value for resting whether $\xi$ is zero. If the absolute value of the estimate of $\xi$ is less than this value then it will be assumed zero and exponential distribution derivatives etc. will be used.

method.init

Method of initialization, either 1 or 2. The first is the method of moments, and the second is a variant of this. If neither work try assigning values to arguments ishape and/or iscale.

zero

An integer-valued vector specifying which linear/additive predictors are modelled as intercepts only. The value must be from the set {1,2} corresponding respectively to $\sigma$ and $\xi$. It is often a good idea for the $\sigma$ parameter only

Value

An object of class "vglmff" (see vglmff-class). The object is used by modelling functions such as vglm and vgam. However, for this VGAM family function, vglm is probably preferred over vgam when there is smoothing.

Warning

Fitting the GPD by maximum likelihood estimation can be numerically fraught. If $1 + \xi (y-\mu)/ \sigma \leq 0$ then some crude evasive action is taken but the estimation process can still fail. This is particularly the case if vgam with s is used. Then smoothing is best done with vglm with regression splines (bs or ns) because vglm implements half-stepsizing whereas vgam doesn't. Half-stepsizing helps handle the problem of straying outside the parameter space.

Details

The distribution function of the GPD can be written $$G(y) = 1 - [1 + \xi (y-\mu) / \sigma ]_{+}^{- 1/ \xi}$$ where $\mu$ is the location parameter (known with value threshold), $\sigma > 0$ is the scale parameter, $\xi$ is the shape parameter, and $h_+ = \max(h,0)$. The function $1-G$ is known as the survivor function. The limit $\xi \rightarrow 0$ gives the shifted exponential as a special case: $$G(y) = 1 - \exp[-(y-\mu)/ \sigma].$$ The support is $y>\mu$ for $\xi>0$, and $\mu < y <\mu-\sigma \xi$="" for="" $\xi<0$.<="" p="">

Smith (1985) showed that if $\xi <= -0.5$="" then="" this="" is="" known="" as="" the="" nonregular="" case="" and="" problems="" difficulties="" can="" arise="" both="" theoretically="" numerically.="" for="" (regular)="" $\xi=""> -0.5$ the classical asymptotic theory of maximum likelihood estimators is applicable; this is the default because Offset=0.5.

Although for $\xi < -0.5$ the usual asymptotic properties do not apply, the maximum likelihood estimator generally exists and is superefficient for $-1 < \xi < -0.5$, so it is ``better'' than normal. To allow for $-1 < \xi < -0.5$ set Offset=1. When $\xi < -1$ the maximum likelihood estimator generally does not exist as it effectively becomes a two parameter problem.

The mean of $Y$ does not exist unless $\xi < 1$, and the variance does not exist unless $\xi < 0.5$. So if you want to fit a model with finite variance use lshape="elogit".

References

Coles, S. (2001) An Introduction to Statistical Modeling of Extreme Values. London: Springer-Verlag.

Smith, R. L. (1985) Maximum likelihood estimation in a class of nonregular cases. Biometrika, 72, 67--90.

Examples

Run this code

# Simulated data from an exponential distribution (xi=0)
y = rexp(n=3000, rate=2)
fit = vglm(y ~ 1, gpd(threshold=0.5), trace=TRUE)
fitted(fit)[1:5,]
coef(fit, matrix=TRUE)   # xi should be close to 0
Coef(fit)
summary(fit)

yt = y[y>fit@extra$threshold]  # Note the threshold is stored here
all.equal(c(yt), c(fit@y))     # TRUE
# Check the 90 percentile
i = yt < fitted(fit)[1,"90%"]
100*table(i)/sum(table(i))   # Should be 90%

# Check the 95 percentile
i = yt < fitted(fit)[1,"95%"]
100*table(i)/sum(table(i))   # Should be 95%

plot(yt, col="blue", las=1, main="Fitted 90% and 95% quantiles")
matlines(1:length(yt), fitted(fit), lty=2:3, lwd=2)


# Another example
nn = 2000; threshold = 0; x = runif(nn)
xi = exp(-0.8)-0.5
y = rgpd(nn, scale=exp(1+0.2*x), shape=xi)
fit = vglm(y ~ x, gpd(threshold), trace=TRUE)
Coef(fit)
coef(fit, matrix=TRUE)

Run the code above in your browser using DataLab