fExtDep.np: Non-parametric extremal dependence estimation

Description

This function estimates the bivariate extremal dependence structure using a non-parametric approach based on Bernstein Polynomials.

Usage

fExtDep.np(x, method, cov1=NULL, cov2=NULL, u, mar.fit=TRUE, 
           mar.prelim=TRUE, par10, par20, sig10, sig20, param0=NULL, 
           k0=NULL, pm0=NULL, prior.k="nbinom", prior.pm="unif", 
           nk=70, lik=TRUE, 
           hyperparam = list(mu.nbinom=3.2, var.nbinom=4.48), 
           nsim, warn=FALSE, type="rawdata")
           
# S3 method for ExtDep_npBayes
plot(x, type, summary.mcmc, burn, y, probs, 
     A_true, h_true, est.out, mar1, mar2, dep, 
     QatCov1=NULL, QatCov2=QatCov1, P, 
     CEX=1.5, col.data, col.Qfull, col.Qfade, data=NULL, ...)
# S3 method for ExtDep_npFreq
plot(x, type, est.out, mar1, mar2, dep, P, CEX=1.5, 
     col.data, col.Qfull, data=NULL, ...)
     
# S3 method for ExtDep_npEmp
plot(x, type, est.out, mar1, mar2, dep, P, CEX=1.5, 
     col.data, col.Qfull, data=NULL, ...)     
     
# S3 method for ExtDep_npBayes
summary(object, w, burn, cred=0.95, plot=FALSE, ...)

Value

Regarding the fExtDep.np function:

Returns lists of class ExtDep_npBayes, ExtDep_npFreq or ExtDep_npEmp depending on the value of the method argument. Each list includes:

method:: The argument.
type:: whether it is "maxima" or "rawdata" (in the broader sense that a threshold exceedance model was taken).

If method="Bayesian" the list also includes:

mar.fit:: The argument.
pm:: The posterior sample of probability masses.
eta:: The posterior sample for the coeficients of the Bernstein polynomial.
k:: The posterior sample for the Bernstein polynoial order.
accepted:: A binary vector indicating if the proposal was accepted.
acc.vec:: A vector containing the acceptance probabilities for the dependence parameters at each iteration.
prior:: A list containing hyperparam, prior.pm and prior.k.
nsim:: The argument.
data:: The argument.

In addition if the marginal parameters are estimated (mar.fit=TRUE):

cov1, cov2:: The arguments.
accepted.mar, accepted.mar2:: Binary vectors indicating if the marginal proposals were accepted.
straight.reject1, straight.reject2:: Binary vectors indicating if the marginal proposals were rejected straight away as not respecting existence conditions (proposal is multivariate normal).
acc.vec1, acc.vec2:: Vectors containing the acceptance probabilities for the marginal parameters at each iteration.
sig1.vec, sig2.vec:: Vectors containing the values of the scaling parameter in the marginal proposal distributions.

Finally, if the argument u is provided, the list also contains:

threshold:: A bivariate vector indicating the threshold level for both margins.
kn:: The empirical estimate of the probability of being greater than the threshold.

When method="Frequentist", the list includes:

hhat:: An estimate of the angular density.
Hhat:: An estimate of the angular measure.
p0, p1:: The estimates of the probability mass at 0 and 1.
Ahat:: An estimate of the Pickands dependence function.
w:: A sequence of value on the bivariate unit simplex.
q:: A real in $[0,1]$ indicating the quantile associated with the threshold u. Takes value NULL if type="maxima".
data:: The data on the unit Frechet scale (empirical transformation) if type="rawdata" and mar.fit=TRUE. Data on the original scale otherwise.

When method="Empirical", the list includes:

fi:: An estimate of the angular measure.
h_hat:: An estimate of the angular density.
theta_seq:: A sequence of angles from $0$ to $pi/2$
data: The argument.

Regarding the code summary method function:

The function returns a list with the following objects:

k.median, k.up, k.low:: Posterior median, upper and lower bounds of the CI for the estimated Bernstein polynomial degree $\kappa$;
h.mean, h.up, h.low:: Posterior mean, upper and lower bounds of the CI for the estimated angular density $h$;
A.mean, A.up, A.low:: Posterior mean, upper and lower bounds of the CI for the estimated Pickands dependence function $A$;
p0.mean, p0.up, p0.low:: Posterior mean, upper and lower bounds of the CI for the estimated point mass $p_0$;
p1.mean, p1.up, p1.low:: Posterior mean, upper and lower bounds of the CI for the estimated point mass $p_1$;
A_post:: Posterior sample for Pickands dependence function;
h_post:: Posterior sample for angular density;
eta.diff_post:: Posterior sample for the Bernstein polynomial coefficients ($\eta$ parametrisation);
beta_post:: Posterior sample for the Bernstein polynomial coefficients ($\beta$ parametrisation);
p0_post, p1_post:: Posterior sample for point masses $p_0$ and $p_1$;
w:: A vector of values on the bivariate simplex where the angular density and Pickands dependence function were evaluated;
burn:: The argument provided;

If the margins were also fitted, the list given as object would contain mar1 and mar2 and the function would also output:

mar1.mean, mar1.up, mar1.low:: Posterior mean, upper and lower bounds of the CI for the estimated marginal parameter on the first component;
mar2.mean, mar2.up, mar2.low:: Posterior mean, upper and lower bounds of the CI for the estimated marginal parameter on the second component;
mar1_post:: Posterior sample for the estimated marginal parameter on the first component;
mar2_post:: Posterior sample for the estimated marginal parameter on the second component;

Arguments

x

fExtDep.np: A matrix containing the data.

plot method functions: any object returned by fExtDep.np.

object

A list object of class ExtDep_npBayes.

method

A character string indicating the estimation method inlcuding "Bayesian", "Frequentist" and "Empirical".

cov1, cov2

A covariate vector/matrix for linear model on the location parameter of the marginal distributions. length(cov1)/nrow(cov1) needs to match nrow(data). If NULL it is assumed to be constant. Required when method="Bayesian".

u

When method="Bayesian": a bivariate indicating the threshold for the exceedance approach. If logical TRUE the threshold are calculated by default as the 90% quantiles. When missing, a block maxima approach is considered. When method="Frequentist": the associated quantile is used to select observations with the largest radial components. If missing, the 90% quantile is used.

mar.fit

A logical value indicated whether the marginal distributions should be fitted. When method="Frequentist", data are empirically transformed to unit Frechet scale if mar.fit=TRUE. Not required when method="Empirical".

rawdata

A character string specifying if the data is "rawdata" or "maxima". Required when method="Frequentist".

mar.prelim

A logical value indicated whether a preliminary fit of marginal distributions should be done prior to estimating the margins and dependence. Required when method="Bayesian".

par10, par20

Vectors of starting values for the marginal parameter estimation. Required when method="Bayesian" and mar.fit=TRUE

sig10, sig20

Positive reals representing the initial value for the scaling parameter of the multivariate normal proposal distribution for both margins. Required when method="Bayesian" and mar.fit=TRUE

param0

A vector of initial value for the Bernstein polynomial coefficients. It should be a list with elements $eta and $beta which can be generated by the internal function rcoef if missing. Required when method="Bayesian".

k0

An integer indicating the initial value of the polynomial order. Required when method="Bayesian".

pm0

A list of initial values for the probability masses at the boundaries of the simplex. It should be a list with two elements $p0 and $p1. Randomly generated if missing. Required when method="Bayesian".

prior.k

A character string indicating the prior distribution on the polynomial order. By default prior.k="nbinom" (negative binomial) but can also be "pois" (Poisson). Required when method="Bayesian".

prior.pm

A character string indicating the prior on the probability masses at the endpoints of the simplex. By default prior.pm="unif" (uniform) but can also be "beta" (beta). Required when method="Bayesian".

nk

An integer indicating the maximum polynomial order. Required when method="Bayesian".

lik

A logical value; if FALSE, an approximation of the likelihood, by means of the angular measure in Bernstein form is used (TRUE by default). Required when method="Bayesian".

hyperparam

A list of the hyper-parameters depending on the choice of prior.k and prior.pm. See details. Required when method="Bayesian".

nsim

An integer indicating the number of iterations in the Metropolis-Hastings algorithm. Required when method="Bayesian".

warn

A logical value. If TRUE warnings are printed when some specifics (e.g., param0, k0, pm0 and hyperparam) are not provided and set by default. Required when method="Bayesian".

type

fExtDep.np: A character string indicating whther the data are "rawdata" or "maxima". Required when method="Bayesian".

plot method function:A character string indicating the type of graphical summary to be plotted. Can take value "summary" or "Qsets" for any class type of x, "A" when x is of class ExtDep_npBayes or ExtDep_npFreq, "returns", "h", "pm" and "k" when x is of class ExtDep_npBayes or "psi" x is of class ExtDep_npEmp.

summary.mcmc

The output of the summary.ExtDep_npBayes function.

burn

The burn-in period.

y

A 2-column matrix of unobserved thresholds at which the returns are calculated. Required when type="returns".

probs

The probability of joint exceedances, the output of the return function.

A_true

A vector representing the true pickands dependence function evaluated at the grid points on the simplex given in the summary.mcmc object.

h_true

A vector representing the true angular density function evaluated at the grid points on the simplex given in the summary.mcmc object.

est.out

A list containing:

ghat:: a 3-row matrix giving the posterior summary for the estimate of the bound;

Shat and Shat_post:

a matrix of the posterior and a 3-row matrix giving the posterior summary for the estimate of the basic set $S$;

nuShat and nuShat_post:

a matrix of the posterior and a 3-row matrix giving the posterior summary for the estimate of the measure $\nu(S)$;

Note that a posterior summary is made of its mean and $90\%$credibility interval.

Only required when x is of class ExtDep_npBayes and type="Qsets".

mar1, mar2

Vectors of marginal GEV parameters. Required when type="Qsets" except when x is of class ExtDep_npBayes and the marginal parameter were fixed.

dep

A logical value; if TRUE the estimate of the dependence is plotted when computing extreme quantile regions (type="Qsets").

QatCov1, QatCov2

Matrices representing the value of the covariates at which extreme quantile regions should be computed. Required when type="Qsets". See arguments cov1 and cov2 from fExtDep.np.

A vector indicating the probabilities associated with the quantiles to be computed. Required when type="Qsets".

CEX

Label and axis sizes.

col.data, col.Qfull, col.Qfade

Colors for data, estimate of extreme quantile regions and its credible interval (when applicable). Required when type="Qsets".

data

A 2-column matrix providing the original data to be plotted when type="Qsets".

A matrix or vector of values on the unit simplex. If vector values need to be between 0 and 1, if matrix each row need to sum to one.

cred

A probability for the credible intervals.

plot

A logical value indicating whether plot.ExtDep_npBayes should be called.

...

Additional graphical parameters

Author

Simone Padoan, simone.padoan@unibocconi.it, https://faculty.unibocconi.it/simonepadoan/; Boris Beranger, borisberanger@gmail.com https://www.borisberanger.com;

Details

Regarding the fExtDep.np function:

When method="Bayesian", the vector of hyper-parameters is provided in the argument hyperparam. It should include:

If prior.pm="unif": requires hyperparam$a.unif and hyperparam$b.unif.
If prior.pm="beta": requires hyperparam$a.beta and hyperparam$b.beta.
If prior.k="pois": requires hyperparam$mu.pois.
If prior.k="nbinom": requires hyperparam$mu.nbinom and hyperparam$var.nbinom or hyperparam$pnb and hyperparam$rnb. The relationship is pnb = mu.nbinom/var.nbinom and rnb = mu.nbinom^2 / (var.nbinom-mu.nbinom).

When u is specified Algorithm 1 of Beranger et al. (2021) is applied whereas when it is not specified Algorithm 3.5 of Marcon et al. (2016) is considered.

When method="Frequentist", if type="rawdata" then pseudo-polar coordinates are extracted and only observations with a radial component above some high threshold (the quantile equivalent of u for the raw data) are retained. The inferential approach proposed in Marcon et al. (2017) based on the approximate likelihood is applied.

When method="Empirical", the empirical estimation procedure presented in Einmahl et al. (2013) is applied.

Regarding the plot method function:

type="returns":: produces a (contour) plot of the probabilities of exceedances for some threshold. This corresponds to the output of the returns function.
type="A":: produces a plot of the estimated Pickands dependence function. If A_true is specified the plot includes the true Pickands dependence function and a functional boxplot for the estimated function.
type="h":: produces a plot of the estimated angular density function. If h_true is specified the plot includes the true angular density and a functional boxplot for the estimated function.
type="pm":: produces a plot of the prior against the posterior for the mass at $\{0\}$.
type="k":: produces a plot of the prior against the posterior for the polynomial degree $k$.
type="summary":: when the estimation was performed in a Bayesian framework then a 2 by 2 plot with types "A", "h", "pm" and "k" is produced. Otherwise a 1 by 2 plot with types "A" and "h" is produced.
type="Qsets":: displays extreme quantile regions according to the methodology developped in Beranger et al. (2021).

Regarding the code summary method function:

It is obvious that the value of burn cannot be greater than the number of iterations in the mcmc algorithm. This can be found as object$nsim.

References

Beranger, B., Padoan, S. A. and Sisson, S. A. (2021). Estimation and uncertainty quantification for extreme quantile regions. Extremes, 24, 349-375.

Einmahl, J. H. J., de Haan, L. and Krajina, A. (2013). Estimating extreme bivariate quantile regions. Extremes, 16, 121-145.

Marcon, G., Padoan, S. A. and Antoniano-Villalobos, I. (2016). Bayesian inference for the extremal dependence. Electronic Journal of Statistics, 10, 3310-3337.

Marcon, G., Padoan, S.A., Naveau, P., Muliere, P. and Segers, J. (2017) Multivariate Nonparametric Estimation of the Pickands Dependence Function using Bernstein Polynomials. Journal of Statistical Planning and Inference, 183, 1-17.

Examples

Run this code

###########################################################
### Example 1 - Wind Speed and Differential of pressure ###
###########################################################

data(WindSpeedGust)

years <- format(ParcayMeslay$time, format="%Y")
attach(ParcayMeslay[which(years %in% c(2004:2013)),])

# Marginal quantiles
WS_th <- quantile(WS,.9)
DP_th <- quantile(DP,.9)

# Standardisation to unit Frechet (requires evd package)
pars.WS <- evd::fpot(WS, WS_th, model="pp")$estimate
pars.DP <- evd::fpot(DP, DP_th, model="pp")$estimate
  
# transform the marginal distribution to common unit Frechet:
data_uf <- trans2UFrechet(cbind(WS,DP), type="Empirical")

# compute exceedances 
rdata <- rowSums(data_uf)
r0 <- quantile(rdata, probs=.90)
extdata_WSDP <- data_uf[rdata>=r0,]

# Fit
SP_mle <- fExtDep.np(x=extdata_WSDP, method="Frequentist", k0=10, type="maxima")

# Plot
plot(x=SP_mle, type="summary")

####################################################
### Example 2 - Pollution levels in Milan, Italy ###
####################################################
	
if (FALSE) {

### Here we will only model the dependence structure	
data(MilanPollution)

data <- Milan.winter[,c("NO2","SO2")] 
data <- as.matrix(data[complete.cases(data),])

# Thereshold
u <- apply(data, 2, function(x) quantile(x, prob=0.9, type=3))

# Hyperparameters
hyperparam <- list(mu.nbinom = 6, var.nbinom = 8, a.unif=0, b.unif=0.2)

### Standardise data to univariate Frechet margins

f1 <- fGEV(data=data[,1], method="Bayesian", sig0 = 0.0001, nsim = 5e+4)
diagnostics(f1)
burn1 <- 1:30000
gev.pars1 <- apply(f1$param_post[-burn1,],2,mean)
sdata1 <- trans2UFrechet(data=data[,1], pars=gev.pars1, type="GEV")

f2 <- fGEV(data=data[,2], method="Bayesian", sig0 = 0.0001, nsim = 5e+4)
diagnostics(f2)
burn2 <- 1:30000
gev.pars2 <- apply(f2$param_post[-burn2,],2,mean)
sdata2 <- trans2UFrechet(data=data[,2], pars=gev.pars2, type="GEV")

sdata <- cbind(sdata1,sdata2)

### Bayesian estimation using Bernstein polynomials

pollut1 <- fExtDep.np(x=sdata, method="Bayesian", u=TRUE,
                      mar.fit=FALSE, k0=5, hyperparam = hyperparam, nsim=5e+4)

diagnostics(pollut1)
pollut1_sum <- summary(object=pollut1, burn=3e+4, plot=TRUE)
pl1 <- plot(x=pollut1, type="Qsets", summary.mcmc=pollut1_sum, 
            mar1=gev.pars1, mar2=gev.pars2, P = 1/c(600, 1200, 2400),
            dep=TRUE, data=data, xlim=c(0,400), ylim=c(0,400))

pl1b <- plot(x=pollut1, type="Qsets", summary.mcmc=pollut1_sum, est.out=pl1$est.out, 
             mar1=gev.pars1, mar2=gev.pars2, P = 1/c(1200),
             dep=FALSE, data=data, xlim=c(0,400), ylim=c(0,400))

### Frequentist estimation using Bernstein polynomials

pollut2 <- fExtDep.np(x=sdata, method="Frequentist", mar.fit=FALSE, type="rawdata", k0=8)
plot(x=pollut2, type = "summary", CEX=1.5)

pl2 <- plot(x=pollut2, type="Qsets", mar1=gev.pars1, mar2=gev.pars2, 
            P = 1/c(600, 1200, 2400),
            dep=TRUE, data=data, xlim=c(0,400), ylim=c(0,400),
            xlab=expression(NO[2]), ylab=expression(SO[2]),
            col.Qfull = c("red", "green", "blue"))
                      
### Frequentist estimation using EKdH estimator

pollut3 <- fExtDep.np(x=data, method="Empirical")
plot(x=pollut3, type = "summary", CEX=1.5)

pl3 <- plot(x=pollut3, type="Qsets", mar1=gev.pars1, mar2=gev.pars2,
            P = 1/c(600, 1200, 2400),
            dep=TRUE, data=data, xlim=c(0,400), ylim=c(0,400),
            xlab=expression(NO[2]), ylab=expression(SO[2]),
            col.Qfull = c("red", "green", "blue"))

}

Run the code above in your browser using DataLab