gpd: Generalized Pareto distribution modelling

Description

Likelihood based modelling and inference for the generalized Pareto distribution, possibly with explanatory variables.

Usage

gpd(y, data, ...)
## S3 method for class 'default':
gpd(y, data, th, qu, phi = ~1, xi = ~1, penalty = "gaussian",
    prior = "gaussian", method = "optimize", cov="observed", start = NULL, 
    priorParameters = NULL, maxit = 10000, trace = NULL,
    iter = 40500, burn = 500, thin = 4,
    proposal.dist = c("gaussian", "cauchy"),
    jump.cov, jump.const, verbose = TRUE,...)
## S3 method for class 'gpd':
print(x, digits=max(3, getOption("digits") - 3), ...)
## S3 method for class 'gpd':
summary(object, nsim=1000, alpha=0.05, ...)
## S3 method for class 'gpd':
show(x, digits=max(3, getOption("digits") - 3), ...)
## S3 method for class 'gpd':
plot(x, main=rep(NULL, 4), xlab=rep(NULL, 4), nsim=1000, alpha=0.05, ...)
## S3 method for class 'gpd':
AIC(object, ..., k=2)
## S3 method for class 'bgpd':
print(x, print.seed=FALSE, ...)
## S3 method for class 'bgpd':
plot(x, which.plots=1:3, density.adjust=2, print.seed=FALSE, ...)

Arguments

Either a numeric vector or the name of a variable in data.

data

A data frame containing y and any covariates.

The threshold for y, exceedances above which will be used to fit the GPD upper tail model.

An alternative to th, a probability defined such that quantile(y,qu) equals th.

phi

Formula for the log of the scale parameter. Defaults to phi = ~ 1 - i.e. no covariates.

Formula for the shape parameter. Defaults to xi = ~ 1 - i.e. no covariates.

penalty

How to penalize the likelhood. Currently, either ``none'', ``gaussian'' or ``lasso'' are the only allowed arguments. If penalty is "gaussian" or "lasso" then the parameters for the penalization are specified through the

prior

If method = "optimize", just an alternative way of specifying the pentalty, and only one or neither of penalty and prior should be given. If method = "simulate", prior must be ``gaussian''

method

Should be either ``optimize'' (the default) or ``simulate''. The first letter or various abbreviations will do. If ``optimize'' is used, the (penalized) likelihood is directly optimized using optim and point estimates (either M

cov

How to compute the covariance matrix of the parameters. Defaults to cov = "observed" in which case the observed information matrix is used, as given in Appendix A of Davison and Smith. The only other option is co

start

Starting values for the parameters, to be passed to optim. If not provided, an exponential distribution (shape = 0) is assumed as the starting point.

priorParameters

A list with two components. The first should be a vector of means, the second should be a covariance matrix if the penalty/prior is "gaussian" or "quadratic" and a diagonal precision matrix if the penalty

maxit

The number of iterations allowed in optim.

trace

Whether or not to print progress to screen. If method = "optimize", the argument is passed into optim -- see the help for that function. If method = "simulate", the argument determines at how man

iter

Number of simulations to generate under method = "simulate". Defaults to 40000.

burn

The number of initial steps to be discarded.

thin

The degree of thinning of the resulting Markov chains. Defaults to 4 (one in every 4 steps is retained).

proposal.dist

The proposal distribution to use, either multivariate gaussian or a multivariate Cauchy.

jump.cov

Covariance matrix for proposal distribution of Metropolis algorithm. This is scaled by jump.const.

jump.const

Control parameter for the Metropolis algorithm.

verbose

Whether or not to print progress to screen. Defaults to verbose=TRUE.

x, object

Object of class gpd, bgpd, summary.gpd or summary.bgpd returned by gpd or summary.gpd.

digits

Number of digits for printing.

main

In plot method for class gpd, titles for diagnostic plots. Should be a vector of length 4, with values corresponding to the character strings to appear on the titles of the pp- qq- return level and density estimate plots respecti

xlab

As for main but labels for x-axes rather than titles.

nsim

In plot and summary methods for class gpd. The number of replicates to be simulated to produce the simulated tolerance intervals. Defaults to nsim = 1000

alpha

In plot and summary methods for class gpd. A (1 - alpha)% simulation envelope is produced. Defaults to alpha = 0.05

Constant used in calculation of AIC=-2*loglik + k*p, defaults to k=2.

print.seed

Whether or not to print the seed used in the simulations, or to annotate the plots with it. Defaults to print.seed=FALSE.

which.plots

In plot method for class bgpd. Which plots to produce. Option 1 gives kernel density estimates, 2 gives traces of the Markov chains with superimposed cumulative means, 3 gives autocorrela

density.adjust

In plot method for class bgpd. Passed into density. Controls the amount of smoothing of the kernel density estimate. Defaults to density.adjust=2.

...

Further arguments to be passed to methods.

Value

If method = "optimize", an object of class gpd:
convergenceOutput from optim relating to whether or not the optimizer converged.
messageA message telling the user whether or not convergence was achieved.
thresholdThe threshold of the data above which the GPD model was fit.
penaltyThe type of penalty function used, if any.
coefficientsThe parameter estimates as computed under maximum likelihood or maximum penalized likelihood.
rateThe proportion of observations above the threshold.
callThe call to gpd that produced the object.
yThe response data above the threshold for fitting.
X.phiThe design matrix for the log of the scale parameter.
X.xiThe design matrix for the shape parameter.
priorParametersSee above.
dataThe original data (above and below the threshold for fitting).
residualsData above the threshold for fitting after transformation to standard exponential scale by using the fitted GPD.
loglikThe value of the optimized log-likelihood.
covThe estimated covariance of the parameters in the model.
seThe estimated standard errors of the parameters in the model.
If method = "simulate", an object of class bgpd:
callThe call to gpd that produced the object.
thresholdThe threshold above which the model was fit.
mapThe point estimates found by maximum penalized likelihood and which were used as the starting point for the Markov chain. This is of class gpd and methods for this class (such as resid and plot) may be useful.
burnThe number of steps of the Markov chain that are to be treated as the burn-in and not used in inferences.
thinThe degree of thinning used.
chainsThe entire Markov chain generated by the Metropolis algorithm.
yThe response data above the threshold for fitting.
dataThe original data (above and below the threshold for fitting).
X.phiThe design matrix for the log of the scale parameter.
X.xiThe design matrix for the log of the shape parameter.
acceptanceThe proportion of proposals that were accepted by the Metropolis algorithm.
seedThe seed used by the random number generator.
paramThe remainder of the chain after deleting the burn-in and applying any thinning.
There are summary, plot, print and coefficients methods available for these classes.

Details

We use the following parameterisation of the GPD: $$P(X \le x) = 1 - \left(1 + \frac{\xi x}{\sigma}\right)^{-1/\xi}$$ for $x \ge 0$ and $1 + \xi x / \sigma \ge 0.$ The scale parameter is sigma ($\sigma$) and the shape parameter is xi ($\xi$). Working with the log of the scale parameter improves the stability of computations, making a quadratic penalty more appropriate and enabling the inclusion of covariates in the model for the scale parameter, which must remain positive. We therefore work with $\phi$=log($\sigma$). All specification of priors or penalty functions refer to $\phi$ rather than $\sigma$. A quadratic penalty can be thought of as a Gaussian prior distribution, whence the terminology of the function. Parameters of the GPD fitted to excesses above threshold th are estimated by using penalized maximum likelihood (method="optimize"), or by simulating from the posterior distribution of the model parameters using a Metropolis algorithm (method="simulate"). In the latter case, start is used as a starting value for the Metropolis algorithm; in its absence, the maximum penalized likelhood point estimates are computed and used.

When a summary or plot is performed, a pointwise (1 - alpha)% tolerance envelope is simulated, based on quantiles of the fitted model. Since the ordered observations will be correlated, if any observation is outside the envelope, it is likely that a chain of observations will be outside the envelope. Therefore, if the number outside the envelope is a little more than alpha%, that does not immediately imply a serious shortcoming of the fitted model.

When method = "optimize", the plot function produces diagnostic plots for the fitted generalized Pareto model. These differ depending on whether or not there are covariates in the model. If there are no covariates then the diagnostic plots are PP- and QQ-plots, a return level plot (produced by plotrl.gpd) and a histogram of the data with superimposed generalized Pareto density estimate. These are all calculated using the data on the original scale. If there are covariates in the model then the diagnostics consist of PP- and QQ- plots calculated by using the model residuals (which will be standard exponential devaiates under the GPD model), and plots of residuals versus fitted model parameters. The PP- and QQ-plots show simulated pointwise tolerance intervals. The region is a (1 - alpha)% region based on nsim simulated samples.

When method = "simulate" the plot function produces diagnostic plots for the Markov chains used to simulate from the posterior distributions for the model parameters. If the chains have converged on the posterior distributions, the trace plots should look like "fat hairy caterpillars" and their cumulative means should converge rapidly. Moreover, the autocorrelation functions should converge quickly to zero.

When method = "simulate" the print and summary functions give posterior means and standard deviations. Posterior means are also returned by the coef method. Depending on what you want to do and what the posterior distributions look like (use plot method) you might want to work with quantiles of the posterior distributions instead of relying on standard errors.

References

A. C. Davison and R. L. Smith, Models for exceedances over high thresholds, Journal of the Royal Statistical Society B, 53, 393 -- 442, 1990

Examples

Run this code

x <- rnorm(1000)
  mod <- gpd(x, qu = 0.7)
  mod
  par(mfrow=c(2, 2))
  plot(mod)

  x <- runif(100,-0.2,0.2)
  data <- data.frame(x=x,y=rgpd(100,sigma=exp(3 + 2*x),xi=x))
  mod <- gpd(y, data, phi = ~x, xi = ~x, th = 0)
  plot(mod)

# Following lines commented out to keep CRAN happy
#  mod <- gpd(x, qu=.7, method="sim")
#  mod
#  par(mfrow=c(3, 2))
#  plot(mod)

Run the code above in your browser using DataLab