fpot: Peaks Over Threshold Modelling using the Generalized Pareto or Point Process Representation

Description

Maximum-likelihood fitting for peaks over threshold modelling, using the Generalized Pareto or Point Process representation, allowing any of the parameters to be held fixed if desired.

Usage

fpot(x, threshold, model = c("gpd", "pp"), start, npp = length(x),
    cmax = FALSE, r = 1, ulow = -Inf, rlow = 1, mper = NULL, ...,
    std.err = TRUE, corr = FALSE, method = "BFGS", warn.inf = TRUE)

Arguments

A numeric vector, which may contain missing values.

threshold

The threshold.

model

The model; either "gpd" (the default) or "pp", for the Generalized Pareto or Point Process representations respectively.

start

A named list giving the initial values for the parameters over which the likelihood is to be maximized. If start is omitted the routine attempts to find good starting values using moment estimators.

npp

The data should contain npp observations per ``period'', where the return level plot produced by plot.pot will represent return periods in units of ``periods''. By default npp = length(x), so that the

cmax

Logical; if FALSE (the default), the model is fitted using all exceedences over the threshold. If TRUE, the model is fitted using cluster maxima, using clusters of exceedences derived from clusters.

r, ulow, rlow

Arguments used for the identification of clusters of exceedences (see clusters). Ignored if cmax is FALSE (the default).

mper

Controls the parameterization of the generalized Pareto model. Should be either NULL (the default), or a positive number (see Details). If mper is not NULL and model = "pp", an err

...

Additional parameters, either for the model or for the optimization function optim. If parameters of the model are included they will be held fixed at the values given (see Examples).

std.err

Logical; if TRUE (the default), the standard errors are returned.

corr

Logical; if TRUE, the correlation matrix is returned.

method

The optimization method (see optim for details).

warn.inf

Logical; if TRUE (the default), a warning is given if the negative log-likelihood is infinite when evaluated at the starting values.

Value

Returns an object of class c("pot","uvevd","pot").
The generic accessor functions fitted (or fitted.values), std.errors, deviance, logLik and AIC extract various features of the returned object.
The function profile can be used to obtain deviance profiles for the model parameters. In particular, profiles of the $m$ period return level $z_m$ can be calculated and plotted when $\code{mper} = m$. The function anova compares nested models. The function plot produces diagnostic plots. An object of class c("pot","uvevd","pot") is a list containing the following components
estimateA vector containing the maximum likelihood estimates.
std.errA vector containing the standard errors.
fixedA vector containing the parameters of the model that have been held fixed.
paramA vector containing all parameters (optimized and fixed).
devianceThe deviance at the maximum likelihood estimates.
corrThe correlation matrix.
convergence, counts, messageComponents taken from the list returned by optim.
threshold, r, ulow, rlow, nppThe arguments of the same name.
nhighThe number of exceedences (if cmax is FALSE) or the number of clusters of exceedences (if cmax is TRUE).
nat, patThe number and proportion of exceedences.
extindThe estimate of the extremal index (i.e. nhigh divided by nat). If cmax is FALSE, this is NULL.
dataThe data passed to the argument x.
exceedancesThe exceedences, or the maxima of the clusters of exceedences.
mperThe argument mper.
scaleThe scale parameter for the fitted generalized Pareto distribution. If mper is NULL and model = "gpd" (the defaults), this will also be an element of param.
callThe call of the current function.

Warning

The standard errors and the correlation matrix in the returned object are taken from the observed information, calculated by a numerical approximation. They must be interpreted with caution when the shape parameter is less than $-0.5$, because the usual asymptotic properties of maximum likelihood estimators do not then hold (Smith, 1985).

Details

The exeedances over the threshold threshold (if cmax is FALSE) or the maxima of the clusters of exeedances (if cmax is TRUE) are (if model = "gpd") fitted to a generalized Pareto distribution (GPD) with location threshold. If model = "pp" the exceedances are fitted to a non-homogeneous Poisson process (Coles, 2001). If mper is NULL (the default), the parameters of the model (if model = "gpd") are scale and shape, for the scale and shape parameters of the GPD. If model = "pp" the parameters are loc, scale and shape. Under model = "pp" the parameters can be interpreted as parameters of the Generalized Extreme Value distribution, fitted to the maxima of npp random variables. In this case, the value of npp should be reasonably large.

For both characterizations, the shape parameters are equivalent. The scale parameter under the generalized Pareto characterization is equal to $b + s(u - a)$, where $a$, $b$ and $s$ are the location, scale and shape parameters under the Point Process characterization, and where $u$ is the threshold.

If $\code{mper} = m$ is a positive value, then the generalized Pareto model is reparameterized so that the parameters are rlevel and shape, where rlevel is the $m$ ``period'' return level, where ``period'' is defined via the argument npp.

The $m$ ``period'' return level is defined as follows. Let $G$ be the fitted generalized Pareto distribution function, with location $\code{threshold} = u$, so that $1 - G(z)$ is the fitted probability of an exceedance over $z > u$ given an exceedance over $u$. The fitted probability of an exceedance over $z > u$ is therefore $p(1 - G(z))$, where $p$ is the estimated probabilty of exceeding $u$, which is given by the empirical proportion of exceedances. The $m$ ``period'' return level $z_m$ satisfies $p(1 - G(z_m)) = 1/(mN)$, where $N$ is the number of points per period (multiplied by the estimate of the extremal index, if cluster maxima are fitted). In other words, $z_m$ is the quantile of the fitted model that corresponds to the upper tail probability $1/(mN)$. If mper is infinite, then $z_m$ is the upper end point, given by threshold minus $\code{scale}/\code{shape}$, and the shape parameter is then restricted to be negative.

References

Smith, R. L. (1985) Maximum likelihood estimation in a class of non-regular cases. Biometrika, 72, 67--90.

Examples

Run this code

uvdata <- rgpd(100, loc = 0, scale = 1.1, shape = 0.2)
M1 <- fpot(uvdata, 1)
M2 <- fpot(uvdata, 1, shape = 0)
anova(M1, M2)
par(mfrow = c(2,2))
plot(M1)
M1P <- profile(M1)
plot(M1P)

M1 <- fpot(uvdata, 1, mper = 10)
M2 <- fpot(uvdata, 1, mper = 100)
M1P <- profile(M1, which = "rlevel", conf=0.975, mesh=0.1)
M2P <- profile(M2, which = "rlevel", conf=0.975, mesh=0.1)
plot(M1P)
plot(M2P)

Run the code above in your browser using DataLab