GpdFit: Modelling the Generalized Pareto Distribution

Description

A collection and description of functions to model the Generalized Pareto Distribution, GPD, based on R's 'evir' package. Two approaches for parameter estimation are provided: Maximum likelihood estimation and the probability weighted moment method. The functions are: ll{ gpdSim generates data from the GPD, gpdFit fits empirical or simulated data to the distribution, print print method for a fitted GPD object of class ..., plot plot method for a fitted GPD object, summary summary method for a fitted GPD object, gpdqPlot estimation of high quantiles, gpdquantPlot variation of high quantiles with threshold, gpdriskmeasures prescribed quantiles and expected shortfalls, gpdsfallPlot expected shortfall with confidence intervals, gpdshapePlot variation of shape with threshold, gpdtailPlot plot of the tail. }

Usage

gpdSim(model = list(shape = 0.25, location = 0, scale = 1), n = 1000)
gpdFit(x, threshold = NA, nextremes = NA, type = c("mle", "pwm"),
    information = c("observed", "expected"), ...)

## S3 method for class 'gpdFit':
print(x, \dots)
## S3 method for class 'gpdFit':
plot(x, which = "all", \dots)
## S3 method for class 'gpdFit':
summary(object, doplot = TRUE, which = "all", \dots)

gpdqPlot(x, pp = 0.99, ci.type = c("likelihood", "wald"), ci.p = 0.95, 
    like.num = 50)
gpdquantPlot(data, p = 0.99, models = 30, start = 15, end = 500, 
    reverse = TRUE, ci = 0.95, autoscale = TRUE, labels = TRUE, ...)
gpdriskmeasures(x, plevels = c(0.99, 0.995, 0.999, 0.9995, 0.9999))
gpdsfallPlot(x, pp = 0.99, ci.p = 0.95, like.num = 50)
gpdshapePlot(data, models = 30, start = 15, end = 500, reverse = TRUE,
    ci = 0.95, autoscale = TRUE, labels = TRUE, ...) 
gpdtailPlot(fit, optlog = NA, extend = 1.5, labels = TRUE, ...)

Arguments

autoscale

whether or not plot should be automatically scaled; if not, xlim and ylim graphical parameters may be entered.

the probability for asymptotic confidence band; for no confidence band set to zero.

ci.p

the probability for confidence interval (must be less than 0.999).

ci.type

the method for calculating a confidence interval: "likelihood" or "wald".

data

a numeric vector of data.

doplot

a logical. Should the results be plotted?

extend

optional argument for plots 1 and 2 expressing how far x-axis should extend as a multiple of the largest data value. This argument must take values greater than 1 and is useful for showing estimated quantiles beyond data.

fit

[print][plot][summary] - print method, a fitted object of class "gpd".

information

whether standard errors should be calculated with "observed" or "expected" information. This only applies to the maximum likelihood method; for the probability-weighted moments method "expected"

labels

optional argument for plots 1 and 2  specifying whether or not 
        axes should be labelled.

like.num

the number of times to evaluate profile likelihood.

model

[gpdsim] - 
a list with components shape, location and 
        scale giving the parameters of the GPD distribution.
        By default the shape parameter has the value 0.25, the
        location is zero and the

models

the number of consecutive gpd models to be fitted.

n

[gpdsim] - 
lnumber of generated data points, an integer value.

nextremes

[gpdFit] - 
the number of upper extremes to be used (either this or 
        threshold must be given but not both).

object

[summary] - 
a fitted object of class "gpdFit".

optlog

optional argument for plots 1 and 2 giving a particular choice 
        of logarithmic axes: "x" x-axis only; "y" y-axis 
        only; "xy" both axes; "" neither axis.

plevels, p, pp

a vector of probability levels, the desired probability for the 
        quantile estimate (e.g. 0.99 for the 99th percentile).

reverse

should plot be by increasing threshold (TRUE) or number 
        of extremes (FALSE).

start, end

the lowest and maximum number of exceedances to be considered.

threshold

a threshold value (either this or nextremes must be given 
        but not both).

type

a character string selecting the desired estimation mehtod, either
        "mle" for the maximum likelihood mehtod or "pwm" for 
        the probability weighted moment method. By default, the first will 
        be selected.

which

if which is set to "ask" the function will 
        interactively ask which plot should be displayed. By default
        this value is set to FALSE and then those plots will
        be displayed for which the elem

x

[gpdFit] - 
the data vector. Note, there are two different names
        for the first argument x and data depending 
        which function name is used, either gpdFit or the 
        EVIS synonyme gpd

...

control parameters and plot parameters optionally passed to the 
        optimization and/or plot function. Parameters for the optimization
        function are passed to components of the control argument of
        optim.

`Value`

gpdSim
    
returns a vector of datapoints from the simulated 
    series.
    
    gpdFit 
    
returns an object of class "gpd" describing the 
    fit including parameter estimates and standard errors. 
  
    gpdquantPlot
    
returns invisible a table of results.
  
    gpdshapePlot
    
returns invisible a table of results.
  
    gpdtailPlot
    
returns invisible a list object containing 
    details of the plot is returned invisibly. This object should be 
    used as the first argument of gpdqPlot or gpdsfallPlot 
    to add quantile estimates or expected shortfall estimates to the 
    plot.

`Details`

Simulation:
    

gpdSim simulates data from a Generalized Pareto 
    distribution.
    
Parameter Estimation:
    

gpdFit fits the model parameters either by the probability 
    weighted moment method or the maxim log likelihood method. 
    The function returns an object of class "gpd" 
    representing the fit of a generalized Pareto model to excesses over 
    a high threshold. The fitting functions use the probability weighted 
    moment method, if method method="pwm" was selected, and the 
    the general purpose optimization function optim when the 
    maximum likelihood estimation, method="mle" or method="ml" 
    is chosen.
    
Methods:
    

print.gpd, plot.gpd and summary.gpd are print, 
    plot, and summary methods for a fitted object of class gpdFit. 
    The plot method provides four different plots for assessing fitted 
    GPD model. 
    
gpd* Functions:
    

gpdqPlot calculates quantile estimates and confidence intervals 
    for high quantiles above the threshold in a GPD analysis, and adds a 
    graphical representation to an existing plot. The GPD approximation in 
    the tail is used to estimate quantile. The "wald" method uses 
    the observed Fisher information matrix to calculate confidence interval. 
    The "likelihood" method reparametrizes the likelihood in terms 
    of the unknown quantile and uses profile likelihood arguments to 
    construct a confidence interval. 
    
gpdquantPlot creates a plot showing how the estimate of a 
    high quantile in the tail of a dataset based on the GPD approximation 
    varies with threshold or number of extremes. For every model 
    gpdFit is called. Evaluation may be slow. Confidence intervals 
    by the Wald method may be fastest.
    
gpdriskmeasures makes a rapid calculation of point estimates 
    of prescribed quantiles and expected shortfalls using the output of the
    function gpdFit. This function simply calculates point estimates 
    and (at present) makes no attempt to calculate confidence intervals for 
    the risk measures. If confidence levels are required use gpdqPlot 
    and gpdsfallPlot which interact with graphs of the tail of a loss
    distribution and are much slower.  
    
gpdsfallPlot calculates expected shortfall estimates, in other
    words tail conditional expectation and confidence intervals for high  
    quantiles above the threshold in a GPD analysis. A graphical 
    representation to an existing plot is added. Expected shortfall is 
    the expected size of the loss, given that a particular quantile of the 
    loss distribution is exceeded. The GPD approximation in the tail is used 
    to estimate expected shortfall. The likelihood is reparametrised  in 
    terms of the unknown expected shortfall and profile likelihood arguments 
    are used to construct a confidence interval. 
    
gpdshapePlot creates a plot showing how the estimate of shape 
    varies with threshold or number of extremes. For every model 
    gpdFit is called. Evaluation may be slow.  
    
gpdtailPlot produces a plot of the tail of the underlying 
    distribution of the data.

`References`

Hosking J.R.M., Wallis J.R., (1987);
    Parameter and quantile estimation for the generalized
    	Pareto distribution,   
    Technometrics 29, 339--349.

`Examples`

Run this code## SOURCE("fExtremes.53B-GpdFit")

## Load Data:
   data(danish)
   
## gpdSim - 
   # Simulate GPD Data:
   xmpExtremes("Start: Simulate a GPD Distributed Sample > ")
   x = gpdSim(model = list(shape = 0.25, location = 0, scale = 1), n = 1000)
   
## gpdFit -   
   xmpExtremes("Next: Fit Simulated Data to GPD using PWM > ")
   fit = gpdFit(x, nextremes = length(x), type = "pwm") 
   print(fit)
   par(mfcol = c(4, 2), cex = 0.7)
   summary(fit)  
   
## gpdFit -
   xmpExtremes("Next: Fit Simulated Data to GPD using MLE > ")
   fit = gpdFit(x, nextremes = length(x), type = "mle") 
   print(fit)
   summary(fit) 

## gpdFit - 
   xmpExtremes("Next: Fit Danish Fire Data to Excess Losses over 10 > ")
   fit = gpdFit(danish, 10, type = "mle") 
   print(fit)
   par(mfrow = c(2, 2), cex = 0.7)
   summary(fit)  
     
## gpdqPlot - 
   xmpExtremes("Next: 99.5th Percentiles for Danish Fire Data > ")
   fit = gpdFit(danish, threshold = 10, type = "mle")
   par(mfrow = c(1, 1))
   tail = gpdtailPlot(fit)
   gpdqPlot(tail, 0.995)
   title(main = "Danish Data: 99.5th Percentile") 
	
## gpdquantPlot - 	
   xmpExtremes("Next: 99.9th Percentiles for Danish Fire Data > ")
   par(mfrow = c(1, 1))
   gpdquantPlot(danish, p = 0.999)
   title(sub = "Danish Fire: GPD High Quantile") 

## gpdsfallPlot - 
   xmpExtremes("Next: Expected Shortfall for Danish Fire Data > ")
   fit = gpdFit(danish, nextremes = floor(length(danish)/10), type = "mle")
   par(mfrow = c(1, 1))
   tp = gpdtailPlot(fit)
   gpdsfallPlot(tp, 0.999)   
   title(main = "Danish Fire: Expected Shortfall")   
   
## gpdriskmeasures -
   xmpExtremes("Next: Quantiles and Expected Shortfalls > ")
   # Give estimates of 0.999 and 0.9999 quantiles - Danish Fire Date:   
   fit = gpdFit(danish, threshold = 10, type = "mle") 
   par(mfrow = c(1, 1))
   gpdriskmeasures(fit, c(0.99, 0.995, 0.999, 0.9995, 0.9999))   
  
## gpdshapePlot - 
   xmpExtremes("Next: Shape Plot of Heavy-Tailed Simulated Data > ")
   set.seed(4711)
   par(mfrow = c(1, 1))
   gpdshapePlot(gpdSim(n = 1000))
   title(sub = "Simulated GPD", cex.sub = 0.7)   

## gpdshapePlot - 
   xmpExtremes("Next: Shape Plot of Heavy-Tailed Danish Fire Data > ")
   par(mfrow = c(1, 1))
   gpdshapePlot(danish)
   title(sub = "Danish Fire", cex.sub = 0.7)

## gpdtailPlot - 
   xmpExtremes("Next: Plot Tail Estimate of Danish Fire Data >")
   fit = gpdFit(danish, threshold = 10, type = "mle")
   par(mfrow = c(1, 1))
   gpdtailPlot(fit, main = "Danish Fire: GPD Tail Estimate", col = "steelblue4")
Run the code above in your browser using DataLab