evm: Extreme value modelling

Description

Likelihood based modelling and inference for extreme value models, possibly with explanatory variables.

Usage

evm(y, data, family, ...)
"evm"(y, data, family=gpd, th=-Inf, qu, ..., penalty = NULL, prior = "gaussian", method = "optimize", cov="observed", start = NULL, priorParameters = NULL, maxit = 10000, trace = NULL, iter = 40500, burn = 500, thin = 4, proposal.dist = c("gaussian", "cauchy"), jump.cov, jump.const=NULL, R=100, verbose = TRUE)
"print"(x, digits=max(3, getOption("digits") - 3), ...)
"summary"(object, nsim=1000, alpha=0.05, ...)
"plot"(x, main=rep(NULL, 4), xlab=rep(NULL, 4), nsim=1000, alpha=0.05, ...)
"AIC"(object, penalized=FALSE, ..., k=2)
"print"(x, print.seed=FALSE, ...)
"plot"(x, which.plots=1:3, density.adjust=2, print.seed=FALSE, ...)
texmexMetropolis(x, log.lik, proposals, verbose, trace)
texmexJumpConst(j, map)

Arguments

Either a numeric vector or the name of a variable in data.

data

A data frame containing y and any covariates.

family

An object of class 'texmexFamily'. Defaults to family=gpd and a generalized Pareto distribution is fit to the data. Alternatively the family could be gev, resulting in a generalized extreme value distribution being fit. No other families are currently available in texmex, but users may write their own.

For threshold excess models (such as when family=gpd), the threshold for y, exceedances above which will be used to fit the upper tail model. Note that if you have already thresholded your data and want to model all of y, you still need to specify th.

An alternative to th, a probability defined such that quantile(y,qu) equals th.

...

In evm, formulae for the parameters in the family, e.g. phi = ~ x. If none are specified, they all default to ~1.

penalty

How to penalize the likelhood. Currently, either "none"", "gaussian"" or "lasso"" are the only allowed values. If penalty is "gaussian" or "lasso" then the parameters for the penalization are specified through the priorParameters argument. See below. Defaults to penalty=NULL and applies maximum likelihood estimation.

prior

If method = "optimize", just an alternative way of specifying the pentalty, and only one or neither of penalty and prior should be given. If method = "simulate", prior must be "gaussian" because no other prior distributions have been implemented.

method

Should be either "optimize" (the default), "simulate" or "bootstrap". The first letter or various abbreviations will do. If 'optimize' is used, the (penalized) likelihood is directly optimized using optim and point estimates (either ML or MAP estimates) are returned with other information. If "simulate", a Metropolis algorithm is used to simulate from the joint posterior distribution of the parameters. If "bootstrap", a parametric boostrap is performed.

cov

How to compute the covariance matrix of the parameters. Defaults to cov = "observed" in which case the observed information matrix is used, if the info element of the texmexFamily object is present. Note that currently, this is not implemented for gev. The only other option is cov = "numeric" in which case a numerical approximation of the Hessian is used (see the help for optim). In some cases, particularly with small samples, the numerical approximation can be quite different from the closed form (cov="observed") result, and the value derived from the observed information should be preferred. However, in either case, since the underlying log-likelihood may be far from quadratic for small samples, the resulting estimates of standard errors are liable to approximate poorly the true standard errors. Also see the comments in the Details section, below.

start

Starting values for the parameters, to be passed to optim. If not provided, the function will use the start element of the texmexFamily object if it exists.

priorParameters

A list with two components. The first should be a vector of means, the second should be a covariance matrix if the penalty/prior is "gaussian" or "quadratic" and a diagonal precision matrix if the penalty/prior is "lasso", "L1" or "Laplace". If method = "simulate" then these represent the parameters in the Gaussian prior distribution. If method = 'optimize' then these represent the parameters in the penalty function. If not supplied: all default prior means are zero; all default prior variances are $10^4$; all covariances are zero.

maxit

The number of iterations allowed in optim.

trace

Whether or not to print progress to screen. If method = "optimize", the argument is passed into optim -- see the help for that function. If method = "simulate", the argument determines at how many steps of the Markov chain the function should tell the user, and in this case it defaults to trace = 10000.

iter

Number of simulations to generate under method = "simulate". Defaults to 40500.

burn

The number of initial steps to be discarded. Defaults to 500.

thin

The degree of thinning of the resulting Markov chains. Defaults to 4 (one in every 4 steps is retained).

proposal.dist

The proposal distribution to use, either multivariate gaussian or a multivariate Cauchy.

jump.cov

Covariance matrix for proposal distribution of Metropolis algorithm. This is scaled by jump.const.

jump.const, j

Control parameter for the Metropolis algorithm.

verbose

Whether or not to print progress to screen. Defaults to verbose=TRUE.

x, object

Object of class evmOpt, evmSim, summary.evmOpt or summary.evmSim returned by evmSim or summary.evmSim. In texmexMetropolis, x is a matrix, the first row of which is used as the starting point of the simulation.

digits

Number of digits for printing.

main

In plot method for class evmSim, titles for diagnostic plots. Should be a vector of length 4, with values corresponding to the character strings to appear on the titles of the pp, qq, return level, and density estimate plots respectively.

xlab

As for main but labels for x-axes rather than titles.

nsim

In plot and summary methods for class evmSim. The number of replicates to be simulated to produce the simulated tolerance intervals. Defaults to nsim = 1000.

alpha

In plot and summary methods for class evmSim. A 100(1 - alpha)% simulation envelope is produced. Defaults to alpha = 0.05

penalized

Whether to compute AIC using the penalized log-likelihood or the true log-liklihood. Defaults to penalized=FALSE and uses the true log-likelihood.

Constant used in calculation of AIC=-2*loglik + k*p, defaults to k=2.

print.seed

Whether or not to print the seed used in the simulations, or to annotate the plots with it. Defaults to print.seed=FALSE.

which.plots

In plot method for class evmSim. Which plots to produce. Option 1 gives kernel density estimates, 2 gives traces of the Markov chains with superimposed cumulative means, 3 gives autocorrelation functions. Defaults to which.plots=1:3.

density.adjust

In plot method for class evmSim. Passed into density. Controls the amount of smoothing of the kernel density estimate. Defaults to density.adjust=2.

log.lik, proposals

The log-likelihood for the model, and a matrix of random numbers drawn from the proposal distribution.

map

In texmexJumpConst, a model fit by maximum (penalized) likelihood. The function is not intended to be called by the end user.

The number of parametric bootstrap samples to run when method = "bootstrap" is requested. Defaults to 100.

Value

call: The call to evmSim that produced the object.
data: The original data (above and below the threshold for fitting if a distribution for threshold excesses has been used). In detail, data is a list with elements y and D. y is the response variable and D is a list containing the design matrices implied by any formlae used in the call to evm.
convergence: Output from optim relating to whether or not the optimizer converged.
message: A message telling the user whether or not convergence was achieved.
threshold: The threshold of the data above which the evmSim model was fit.
penalty: The type of penalty function used, if any.
coefficients: The parameter estimates as computed under maximum likelihood or maximum penalized likelihood.
rate: The proportion of observations above the threshold. If the model is not a threshold exceedance model (e.g. the GEV model), the rate will be 1.
priorParameters: See above.
residuals: Residuals computed using the residual function in the texmexFamily object, if any.
ploglik: The value of the optimized penalized log-likelihood.
loglik: The value of the optimized (unpenalized) log-likelihood. If penalty='none' is used, this will be identical to ploglik, above.
cov: The estimated covariance of the parameters in the model.
se: The estimated standard errors of the parameters in the model.
xlevels: A named list containing a named list for each design matrix (main parameter) in the model. Each list contians an element named after each factor in the linear predictor for the respective design matrix. These are used by the predict method to ensure all factor levels are known, even if they don't appear in newdata.
call: The call to evmSim that produced the object.
threshold: The threshold above which the model was fit.
map: The point estimates found by maximum penalized likelihood and which were used as the starting point for the Markov chain. This is of class evmOpt and methods for this class (such as resid and plot) may be useful.
burn: The number of steps of the Markov chain that are to be treated as the burn-in and not used in inferences.
thin: The degree of thinning used.
chains: The entire Markov chain generated by the Metropolis algorithm.
y: The response data above the threshold for fitting.
seed: The seed used by the random number generator.
param: The remainder of the chain after deleting the burn-in and applying any thinning.
call: The call to evmBoot that produced the object.
replicates: The parameter estimates from the bootstrap fits.
map: The fit by by maximum penalized likelihood to the orginal data. This is of class evmOpt and methods for this class (such as resid and plot) may be useful.

Details

In previous versions of texmex, the main modelling function was gpd and models using the generalized Pareto distribution (GPD) were used. In this version, the main modelling function is now evm (extreme value model) and the distribution to be used is specified through the family argument.

Object gpd is now an object of class texmexFamily. Currently, the only other texmexFamily object available is gev which results in fitting a generalized extreme value (GEV) distribution to the data.

See Coles (2001) for an introduction to extreme value modelling and the GPD and GEV models.

For the GPD model, we use the following parameterisation of evm:

$$P(Y \le y) = 1 - \left(1 + \frac{\xi y}{\sigma}\right)^{-1/\xi}$$ for $y \ge 0$ and $1 + \xi y / \sigma \ge 0.$

For the GEV model, we use: $$P(Y \le y) = \exp \left\{ -\left[ 1 + \xi \left( \frac{y - \mu}{\sigma}\right) \right] ^{-1/\xi} \right\} $$

In each case, the scale parameter is sigma ($\sigma$) and the shape parameter is xi ($\xi$). The GEV distribution also has location parameter mu ($\mu$).

Working with the log of the scale parameter improves the stability of computations, makes a quadratic penalty more appropriate and enables the inclusion of covariates in the model for the scale parameter, which must remain positive. We therefore work with $\phi$=log($\sigma$). All specification of priors or penalty functions refer to $\phi$ rather than $\sigma$. A quadratic penalty can be thought of as a Gaussian prior distribution, whence the terminology of the function.

Parameters of the evm are estimated by using maximum (penalized) likelihood (method = "optimize"), or by simulating from the posterior distribution of the model parameters using a Metropolis algorithm (method = "simulate"). In the latter case, start is used as a starting value for the Metropolis algorithm; in its absence, the maximum penalized likelhood point estimates are computed and used.

A boostrap approach is also available (method = "bootstrap"). This runs a parametric bootstrap, simulating from the model fit by optimization.

When method = "optimize", the plot function produces diagnostic plots for the fitted model. These differ depending on whether or not there are covariates in the model. If there are no covariates then the diagnostic plots are PP- and QQ-plots, a return level plot (produced by plotrl.evmSim) and a histogram of the data with superimposed density estimate. These are all calculated using the data on the original scale. If there are covariates in the model then the diagnostics consist of PP- and QQ- plots calculated by using the model residuals (which will be standard exponential devaiates under the GPD model and standard Gumbel deviates under the GEV model), and plots of residuals versus fitted model parameters.

The PP- and QQ-plots show simulated pointwise tolerance intervals. The region is a 100(1 - alpha)% region based on nsim simulated samples.

When method = "simulate" the plot function produces diagnostic plots for the Markov chains used to simulate from the posterior distributions for the model parameters. If the chains have converged on the posterior distributions, the trace plots should look like "fat hairy caterpillars" and their cumulative means should converge rapidly. Moreover, the autocorrelation functions should converge quickly to zero.

When method = "bootstrap", the plot function plots the bootstrap distributions of the parameters.

When method = "simulate" the print and summary functions give posterior means and standard deviations. Posterior means are also returned by the coef method. Depending on what you want to do and what the posterior distributions look like (use plot method) you might want to work with quantiles of the posterior distributions instead of relying on standard errors.

When method = "bootstrap", summaries of the bootstrap distribution and the bootstrap estimate of bias are displayed.

References

S. Coles. An Introduction to Statistical Modelling of Extreme Values. Springer, 2001.

Examples

Run this code

  mod <- evm(rain, th=30)
  mod
  par(mfrow=c(2, 2))
  plot(mod)

#  mod <- evm(rain, th=30, method="sim")
#  par(mfrow=c(3, 2))
#  plot(mod)

  mod <- evm(SeaLevel, data=portpirie, family=gev)
  mod
  plot(mod)

#  mod <- evm(SeaLevel, data=portpirie, family=gev, method="sim")
#  par(mfrow=c(3, 3))
#  plot(mod)

Run the code above in your browser using DataLab