Learn R Programming

fExtremes (version 220.10063)

ExtremesData: Explorative Data Analysis

Description

A collection and description of functions for explorative data analysis including data preprocessing of extreme values. The tools include plot functions for emprical distributions, quantile plots, graphs exploring the properties of exceedences over a threshold, plots for mean/sum ratio and for the development of records. The data preprocessing includes tools to separate data beyond a threshold value, to compute blockwise data like block maxima, and to decluster point process data. The plot functions are: ll{ emdPlot Plot of empirical distribution function, qqPlot Normal quantile-quantile plot, qqbayesPlot Normal QQ-Plot with 95 percent intervals, qPlot Exponential/Pareto quantile plot, mePlot Plot of mean excesses over a threshold, mrlPlot another variant, mean residual life plot, mxfPlot another variant, with confidence intervals, msratioPlot Plot of the ratio of maximum and sum, recordsPlot Record development compared with iid data, ssrecordsPlot another variant, investigates subsamples, xacfPlot ACF of exceedences over a threshold, interactivePlot a framework for interactive plot displays, gridVector creates from two vectors x and y all grid points. } The functions for data preprocessing are: ll{ findThreshold Upper threshold for a given number of extremes, blocks Create data blocks on vectors and time series, blockMaxima Block Maxima from a vector or a time series, deCluster Declusters clustered point process data. }

Usage

emdPlot(x, doplot = TRUE, plottype = c("", "x", "y", "xy"), labels = TRUE, ...)

qqPlot(x, doplot = TRUE, labels = TRUE, ...) 
qqbayesPlot(x, doplot = TRUE, labels = TRUE, ...)
qPlot(x, xi = 0, trim = NA, threshold = NA, doplot = TRUE, labels = TRUE, ...)

mePlot(x, doplot = TRUE, labels = TRUE, ...)
mrlPlot(x, conf = 0.95, umin = NA, umax = NA, nint = 100, doplot = TRUE, 
     plottype = c("autoscale", ""), labels = TRUE, ...)  
mxfPlot(x, tail = 0.05, doplot = TRUE, labels = TRUE, ...)  
   
msratioPlot(x, p = 1:4, doplot = TRUE, plottype = c("autoscale", ""), 
    labels = TRUE, ...) 
   
recordsPlot(x, conf = 0.95, doplot = TRUE, labels = TRUE, ...)
ssrecordsPlot(x, subsamples = 10, doplot = TRUE, plottype = c("lin", "log"),
    labels = TRUE, ...)

xacfPlot(x, threshold = 0.95, lag.max = 15, doplot = TRUE, ...)

interactivePlot(x, choices = paste("Plot", 1:9), 
    plotFUN = paste("plot.", 1:9, sep = ""), which = "all", ...)
gridVector(x, y)

findThreshold(x, n = NA)
blocks(x, block = "month", FUN = max)
blockMaxima(x, block = "month", details = FALSE, doplot = TRUE, ...)
deCluster(x, run = NA, doplot = TRUE)

Arguments

block
[blockMaxima] - the block size. A numeric value is interpreted as the number of data values in each successive block. All the data is used, so the last block may not contain block observations. If the data
choices
[interactivePlot] - a vector of character strings for the choice menu. By Default "Plot 1" ... "Plot 9" allowing for 9 plots at maximum.
conf
[recordsPlot] - a confidence level. By default 0.95, i.e. 95%.
details
[blockMaxima] - a logical. Should details be printed?
doplot
a logical. Should the results be plotted? By default TRUE.
FUN
the function to be applied. Additional arguments are passed by the ... argument.
labels
a logical. Whether or not x- and y-axes should be automatically labelled and a default main title should be added to the plot. By default TRUE.
lag.max
[xacfPlot] - maximum number of lags at which to calculate the autocorrelation functions. The default value is 15.
nint
[mrlPlot] - the number of intervals, see umin and umax. The default value is 100.
n
[findThreshold] - a numeric value or vector giving number of extremes above the threshold. If n is not specified, n is set to an integer representing 5% of the data from the whole data set x
p
[msratioPlot] - the power exponents, a numeric vector. By default a sequence from 1 to 4 in unit integer steps.
plotFUN
[interactivePlot] - a vector of character strings naming the plot functions. By Default "plot.1" ... "plot.9" allowing for 9 plots at maximum.
plottype
[emdPlot] - which axes should be on a log scale: "x" x-axis only; "y" y-axis only; "xy" both axes; "" neither axis. [msratioPlot] - a logical, if set to "autoscale"<
run
[deCluster] - parameter to be used in the runs method; any two consecutive threshold exceedances separated by more than this number of observations/days are considered to belong to different clusters.
subsamples
[ssrecordsPlot] - the number of subsamples, by default 10, an integer value.
tail
[mxfPlot] - the threshold determined from the relative number of data points defining the tail, a numeric value; by default 0.05 which says that 5% of the data make the tail.
threshold, trim
[qPlot][xacfPlot] - a numeric value at which data are to be left-truncated, value at which data are to be right-truncated or the thresold value, by default 95%.
umin, umax
[mrlPlot] - range of threshold values. If umin and/or umax are not available, then by default they are set to the following values: umin=mean(x) and umax=max(x).
which
plot selection, which graph should be displayed? If "which" is a character string named "ask" the user is interactively asked which to plot, if a logical vector of length N, those plots which are set
x, y
numeric data vectors or in the case of x an object to be plotted. [finThreshold][blocks][blockMaxima][deCluster] - a numeric data vector from which findThreshold and blockMaxima determine the threshold value
xi
the shape parameter of the generalized Pareto distribution.
...
additional arguments passed to the FUN or plot function.

Value

  • findThreshold returns a numeric vector of suitable thresholds. blockMaxima returns a numeric vector of block maxima data. deCluster returns an object for the declustered point process.

Details

Empirical Distribution Function: The function emdPlot is a simple explanatory function. A straight line on the double log scale indicates Pareto tail behaviour. Quantile--Quantile Plot: The function qqPlot produces a normal QQ-plot. Note, that qqPlot is not a synonym function call to the R-base function qqplot which produces a quantile-quantile plot of two datasets. To help with assessing the relevance of sampling variability on just "how close" to the normal the data appears, qqbayesPlot adds approximate posterior 95 function at each point. qPlot creates a QQ-plot for threshold data. If xi is zero the reference distribution is the exponential; if xi is non-zero the reference distribution is the generalized Pareto with that value of xi. In the case of the exponential, the plot is interpreted as follows: Concave departures from a straight line are a sign of heavy-tailed behaviour, convex departures show thin-tailed behaviour. Mean Excess Function Plot: Three variants to plot the mean excess function are available: A sample mean excess plot over increasing thresholds, and two mean excess function plots with confidence intervals for discrimination in the tails of a distribution. In general, an upward trend in a mean excess function plot shows heavy-tailed behaviour. In particular, a straight line with positive gradient above some threshold is a sign of Pareto behaviour in tail. A downward trend shows thin-tailed behaviour whereas a line with zero gradient shows an exponential tail. Here are some hints: Because upper plotting points are the average of a handful of extreme excesses, these may be omitted for a prettier plot. For mrlPlot and mxfPlot the upper tail is investigated; for the lower tail reverse the sign of the data vector. Plot of the Maximum/Sum Ratio: The ratio of maximum and sum is a simple tool for detecting heavy tails of a distribution and for giving a rough estimate of the order of its finite moments. Sharp increases in the curves of a msratioPlot are a sign for heavy tail behaviour. Plot of the Development of Records: These are functions that investigate the development of records in a dataset and calculate the expected behaviour for iid data. recordPlot counts records and reports the observations at which they occur. In addition subsamples can be investigated with the help of the function ssrecords. ACF Plot of Exceedences over a Thresold: This function plots the autocorrelation functions of heights and distances of exceedences over a threshold. Finding Thresholds: The function findThreshold finds a threshold so that a given number of extremes lie above. When the data are tied a threshold is found so that at least the specified number of extremes lie above. Computing Block Maxima: The function blockMaxima calculates block maxima from a vector or a time series, whereas the function blocks is more general and allows for the calculation of an arbitrary function FUN on blocks. De-Clustering Point Processes: The function deCluster declusters clustered point process data so that Poisson assumption is more tenable over a high threshold.

References

Coles S. (2001); Introduction to Statistical Modelling of Extreme Values, Springer. Embrechts, P., Klueppelberg, C., Mikosch, T. (1997); Modelling Extremal Events, Springer.

Examples

Run this code
## SOURCE("fExtremes.51A-ExtremesData")

## emdPlot -
   xmpExtremes("Start: Empirical Distribution Function >")
   # Danish fire insurance data show Pareto tail behaviour:
   par(mfrow = c(2, 2))
   data(danish)
   emdPlot(danish, plottype = "xy", labels = FALSE)
   title(xlab = "x", ylab = "1-F(x)", main = "Danish Fire")   
   # BMW Stocks:
   data(bmw)
   emdPlot(bmw, plottype = "xy", labels = FALSE)
   title(xlab = "x", ylab = "1-F(x)", main = "BMW Stocks")  
   # Simulated Student-t:
   emdPlot(rt(5000, 4), plottype = "xy") 
 
## qqPlot -
   xmpExtremes("Next: Quantile-Quantile Plot >")
   # QQ-Plot of Simulated Normal rvs:
   par(mfrow = c(2, 2))
   set.seed(4711)
   qqPlot(rnorm(5000))
   text(-3.5, 3, pos = 4, "Simulated Normal rvs")
   # QQ-Plot of simulated Student-t rvs:
   qqPlot(rt(5000, 4))
   text(-3.5, 11.0, pos = 4, "Simulated Student-t rvs")
   # QQ-Plot of BMW share residuals:
   data(bmw)
   qqPlot(bmw)
   text(-3.5, 0.09, pos = 4, "BMW log returns")     
   
## qPlot -
   xmpExtremes("Next: QQ-Plot of Heavy Tails >")
   # QQ-Plot of heavy-tailed Danish fire insurance data:
   data(danish)
   qPlot(danish) 
 
## mePlot -
   xmpExtremes("Next: Mean Excess Plot >")
   # Sample mean excess plot of heavy-tailed Danish fire 
   # insurance data 
   par(mfrow = c(3, 2))
   data(danish)
   mePlot(danish, labels = FALSE)
   title(xlab = "u", ylab = "e", main = "mePlot - Danish Fire Data")
   
## mrlPlot -
   xmpExtremes("Next: mean Residual Live Plot >")
   # Sample mean residual live plot of heavy-tailed Danish Fire 
   # insurance data 
   mrlPlot(danish, labels = FALSE)
   title(xlab = "u", ylab = "e", main = "mrlPlot - Danish Fire Data")
 
## mxfPlot -
   xmpExtremes("Next: Mean Excess Function Plot >")
   # Plot the mean excess functions for randomly distributed 
   # residuals  
   par(mfrow = c(2, 2))
   n = 10000    
   set.seed(4711)
   xlab = "Threshold: u"; ylab = "Mean Excess: e"
   mxfPlot(rnorm(n), tail = 0.5, labels = FALSE)
   title(xlab = xlab, ylab = ylab, main = "mxf Plot - Normal DF")
   set.seed(7138)
   mxfPlot(rexp(n, 2), tail = 0.5, labels = FALSE)
   title(xlab = xlab, ylab = ylab, main = "mxfPlot - Exponential DF")
   abline(1/2, 0)
   set.seed(6952)
   mxfPlot(rlnorm(n, 0, 2), tail = 0.5, xlim = c(0,90), 
     ylim = c(0, 120), labels = FALSE)
   title(xlab = xlab, ylab = ylab, main = "mxfPlot - Lognormal DF")
   set.seed(8835)
   mxfPlot(rgpd(n, 1/2), tail = 0.10, xlim = c(0,200), 
     ylim=c(0,200), labels = FALSE)
   title(xlab = xlab, ylab = ylab, main = "mxfPlot - Pareto")
   abline(0, 1)  
 
## msratioPlot -
   xmpExtremes("Next: Maximum/Sum Ratio Plot >")
   # Examples for Ratio of Maximum and Sum Plots:
   par(mfrow = c(3, 2))
   data(bmw)
   xlab = "n"; ylab = "R(n)"
   msratioPlot (rnorm(8000), labels = FALSE)
   title(xlab = xlab, ylab = ylab, main = "Standard Normal")
   msratioPlot (rexp(8000), labels = FALSE)
   title(xlab = xlab, ylab = ylab, main = "Exponential")
   msratioPlot (rt(8000, 4), labels = FALSE)
   title(xlab = xlab, ylab = ylab, main = "Student-t")
   msratioPlot (rcauchy(8000), labels = FALSE)
   title(xlab = xlab, ylab = ylab, main = "Cauchy")
   msratioPlot (bmw, labels = FALSE)
   title(xlab = xlab, ylab = ylab, main = "BMW Returns")
  
## recordsPlot -
   xmpExtremes("Next: Records Plot >")
   # Record fire insurance losses in Denmark
   par(mfrow = c(2, 2))
   data(danish)
   recordsPlot(danish)
   text(1, 7.9, pos = 4, "Danish Fire")
   # BMW Stocks
   data(bmw)
   recordsPlot(bmw)
   text(1, 12.8, pos = 4, "BMW Shares")
      
## ssrecordsPlot -
   xmpExtremes("Next: Subsample Record Plot >")
   # Record fire insurance losses in Denmark
   ssrecordsPlot(danish)
   text(1, 9.2, pos = 4, "Danish Fire")
   # BMW Stocks
   ssrecordsPlot(bmw)
   text(1, 10.5, pos = 4, "BMW Shares")  
 
## xacfPlot -
   xmpExtremes("Next: ACF Plot of Exceedences >")
   # Plot ACF of Heights/Distances of Eceedences over threshold:
   par(mfrow = c(2, 2))
   data(bmw)
   xacfPlot(bmw)
   
## findThreshold -
   xmpExtremes("Start: Find Thresold >")
   # Find threshold giving (at least) fifty exceedances 
   # for Danish Fire data
   data(danish)
   findThreshold(danish, n = c(10, 50, 100))    
   
## blockMaxima -
   xmpExtremes("Next: Compute Block Maxima >")
   # Block Maxima (Minima) for the right and left tails 
   # of the BMW log returns:
   data(bmw)
   par(mfrow = c(2, 1))
   blockMaxima( bmw, block = 100)
   blockMaxima(-bmw, block = 100)     
 
## deCluster -
   xmpExtremes("Next: De-Cluster Exceedences >")
   # Decluster the 200 exceedances of a particular  
   # threshold in the negative BMW log-return data
   par(mfrow = c(2, 2))
   fit = potFit(-bmw, nextremes = 200) 
   deCluster(fit$fit$data, 30)

Run the code above in your browser using DataLab