fpot: Peaks Over Threshold Modelling using the Generalized Pareto or Point Process Representation

Description

Maximum-likelihood fitting for peaks over threshold modelling, using the Generalized Pareto or Point Process representation, allowing any of the parameters to be held fixed if desired.

Usage

fpot(x, threshold, model = c("gpd", "pp"), start, npp = length(x),
    cmax = FALSE, r = 1, ulow = -Inf, rlow = 1, mper = NULL, …,
    std.err = TRUE, corr = FALSE, method = "BFGS", warn.inf = TRUE)

Arguments

A numeric vector. If this contains missing values, those values are treated as if they fell below the threshold.

threshold

The threshold.

model

The model; either "gpd" (the default) or "pp", for the Generalized Pareto or Point Process representations respectively.

start

A named list giving the initial values for the parameters over which the likelihood is to be maximized. If start is omitted the routine attempts to find good starting values using moment estimators.

npp

The data should contain npp observations per ``period'', where the return level plot produced by plot.pot will represent return periods in units of ``periods''. By default npp = length(x), so that the ``period'' is the period of time over which the entire data set is collected. It may often be useful to change this default so that more sensible units are used. For example, if yearly periodic units are required, use npp = 365.25 for daily data and npp = 52.18 for weekly data. The argument only makes a difference to the actual fit if mper is not NULL or if model = "pp" (see Details).

cmax

Logical; if FALSE (the default), the model is fitted using all exceedences over the threshold. If TRUE, the model is fitted using cluster maxima, using clusters of exceedences derived from clusters.

r, ulow, rlow

Arguments used for the identification of clusters of exceedences (see clusters). Ignored if cmax is FALSE (the default).

mper

Controls the parameterization of the generalized Pareto model. Should be either NULL (the default), or a positive number (see Details). If mper is not NULL and model = "pp", an error is returned.

…

Additional parameters, either for the model or for the optimization function optim. If parameters of the model are included they will be held fixed at the values given (see Examples).

std.err

Logical; if TRUE (the default), the standard errors are returned.

corr

Logical; if TRUE, the correlation matrix is returned.

method

The optimization method (see optim for details).

warn.inf

Logical; if TRUE (the default), a warning is given if the negative log-likelihood is infinite when evaluated at the starting values.

Value

Returns an object of class c("pot","uvevd","pot").

The generic accessor functions fitted (or fitted.values), std.errors, deviance, logLik and AIC extract various features of the returned object.

The function profile can be used to obtain deviance profiles for the model parameters. In particular, profiles of the \(m\) period return level \(z_m\) can be calculated and plotted when \(\code{mper} = m\). The function anova compares nested models. The function plot produces diagnostic plots.

An object of class c("pot","uvevd","evd") is a list containing the following components

estimate

A vector containing the maximum likelihood estimates.

std.err

A vector containing the standard errors.

fixed

A vector containing the parameters of the model that have been held fixed.

param

A vector containing all parameters (optimized and fixed).

deviance

The deviance at the maximum likelihood estimates.

corr

The correlation matrix.

var.cov

The variance covariance matrix.

convergence, counts, message

Components taken from the list returned by optim.

threshold, r, ulow, rlow, npp

The arguments of the same name.

nhigh

The number of exceedences (if cmax is FALSE) or the number of clusters of exceedences (if cmax is TRUE).

nat, pat

The number and proportion of exceedences.

extind

The estimate of the extremal index (i.e. nhigh divided by nat). If cmax is FALSE, this is NULL.

data

The data passed to the argument x.

exceedances

The exceedences, or the maxima of the clusters of exceedences.

mper

The argument mper.

scale

The scale parameter for the fitted generalized Pareto distribution. If mper is NULL and model = "gpd" (the defaults), this will also be an element of param.

call

The call of the current function.

Warning

The standard errors and the correlation matrix in the returned object are taken from the observed information, calculated by a numerical approximation. They must be interpreted with caution when the shape parameter is less than \(-0.5\), because the usual asymptotic properties of maximum likelihood estimators do not then hold (Smith, 1985).

Details

The exeedances over the threshold threshold (if cmax is FALSE) or the maxima of the clusters of exeedances (if cmax is TRUE) are (if model = "gpd") fitted to a generalized Pareto distribution (GPD) with location threshold. If model = "pp" the exceedances are fitted to a non-homogeneous Poisson process (Coles, 2001).

If mper is NULL (the default), the parameters of the model (if model = "gpd") are scale and shape, for the scale and shape parameters of the GPD. If model = "pp" the parameters are loc, scale and shape. Under model = "pp" the parameters can be interpreted as parameters of the Generalized Extreme Value distribution, fitted to the maxima of npp random variables. In this case, the value of npp should be reasonably large.

For both characterizations, the shape parameters are equivalent. The scale parameter under the generalized Pareto characterization is equal to \(b + s(u - a)\), where \(a\), \(b\) and \(s\) are the location, scale and shape parameters under the Point Process characterization, and where \(u\) is the threshold.

If \(\code{mper} = m\) is a positive value, then the generalized Pareto model is reparameterized so that the parameters are rlevel and shape, where rlevel is the \(m\) ``period'' return level, where ``period'' is defined via the argument npp.

The \(m\) ``period'' return level is defined as follows. Let \(G\) be the fitted generalized Pareto distribution function, with location \(\code{threshold} = u\), so that \(1 - G(z)\) is the fitted probability of an exceedance over \(z > u\) given an exceedance over \(u\). The fitted probability of an exceedance over \(z > u\) is therefore \(p(1 - G(z))\), where \(p\) is the estimated probabilty of exceeding \(u\), which is given by the empirical proportion of exceedances. The \(m\) ``period'' return level \(z_m\) satisfies \(p(1 - G(z_m)) = 1/(mN)\), where \(N\) is the number of points per period (multiplied by the estimate of the extremal index, if cluster maxima are fitted). In other words, \(z_m\) is the quantile of the fitted model that corresponds to the upper tail probability \(1/(mN)\). If mper is infinite, then \(z_m\) is the upper end point, given by threshold minus \(\code{scale}/\code{shape}\), and the shape parameter is then restricted to be negative.

References

Smith, R. L. (1985) Maximum likelihood estimation in a class of non-regular cases. Biometrika, 72, 67--90.

Examples

Run this code

# NOT RUN {
uvdata <- rgpd(100, loc = 0, scale = 1.1, shape = 0.2)
M1 <- fpot(uvdata, 1)
M2 <- fpot(uvdata, 1, shape = 0)
anova(M1, M2)
par(mfrow = c(2,2))
plot(M1)
# }
# NOT RUN {
M1P <- profile(M1)
# }
# NOT RUN {
plot(M1P)
# }
# NOT RUN {
M1 <- fpot(uvdata, 1, mper = 10)
M2 <- fpot(uvdata, 1, mper = 100)
# }
# NOT RUN {
M1P <- profile(M1, which = "rlevel", conf=0.975, mesh=0.1)
# }
# NOT RUN {
M2P <- profile(M2, which = "rlevel", conf=0.975, mesh=0.1)
# }
# NOT RUN {
plot(M1P)
# }
# NOT RUN {
plot(M2P)
# }

Run the code above in your browser using DataLab