ExtQset: Bivariate Extreme Quantile Sets

Description

Computes extreme-quantiles regions of a bivariate random variable correspoding to some exceedance probabilities.

Usage

ExtQset(data, P=NULL, method="bayesian", U=NULL,
        cov1=as.matrix(rep(1,nrow(data))), cov2=as.matrix(rep(1,nrow(data))),
        QatCov1=NULL, QatCov2=NULL, mar=TRUE, par10=c(1,2,1), par20=c(1,2,1),
        sig10=1, sig20=1, param0=NULL, k0=NULL, pm0=NULL, prior.k="nbinom",
        prior.pm="unif", hyperparam = list(mu.nbinom = 3.2, var.nbinom = 4.48),
        nsim=NULL, lo=NULL, up=NULL, d=5)

Value

If method=="bayesian", a list with elements:

Qset_P1, ...: length(P) lists of $100$ $2$ by $3$ matrices corresponding to the extreme quantile regions associated with probability P, using $100$ equidistant points in the unit simplex, for $3$ different levels: $0.05$ -quantile, mean and $0.95$ -quantile;
Qset_P1_post, ...: length(P) lists of $100$ $2$ by $n p o s t$ matrices corresponding to the posterior samples of size npost of the extreme quantile regions associated with probability P;
ghat: a $3$ by $100$ matrix giving the $0.05$ -quantile, mean and $0.95$ -quantile of the inverse angular density $q^{*}$ obtained from the posterior sample;
Shat: a $3$ by $100$ matrix giving the $0.05$ -quantile, mean and $0.95$ -quantile of the basic set $B$ obtained from the posterior sample;
nuShat: the $0.05$ -quantile, mean and $0.95$ -quantile of the estimate of the basic set size $ν (B)$ ;
burn: the burn-in period, specified by the user after observing the diagnostic plots;

If method=="EDhK", a list with elements:

xn_hat1, ...: length(P) vectors of lentgh $101$ giving the x-axis values of the estimated quantiles associated with probability P.
yn_hat1, ...: length(P) vectors of lentgh $101$ giving the y-axis values of the estimated quantiles associated with probability P.

If method=="frequentist", a list with elements:

hhat: a vector of length $100$ giving the estimated angular density at $100$ equally spaced points on the unit simplex;
ghat: a vector of length $100$ giving the corresponding estimated $1 / q^{*}$ function;
Shat: a $100$ by $2$ matrix of the corresponding estimated basic set $B$ ;
nuShat: a real giving the estimate of the basic set size $ν (B)$
Qhat: a $100 \times 2 \times \code l e n g t h (P)$ list corresponding to length(P) $100 \times 2$ matrices representing the estimated extreme quantile regions associated with probability P;
gamhat: a bivariate vector of the estimated marginal tail indices;
uhat: a bivariate vector of the estimated location parameters.

Arguments

data: A matrix of $n \times 2$ observations.
P: The vector of probabilities associated to the quantiles.
method: The estimation method can be "bayesian", "EdHK" or "frequentist".
U: The bivariate threshold value under which the observations are marginally censored.
cov1: A $n \times c 1$ matrix of covariates for the location parameter of the first margin.
QatCov1: q $n \times c 1$ matrix with the value of the first margin covariates at which the quantiles should be computed.
cov2: A $n \times c 2$ matrix of covariates for the location parameter of the second margin.
QatCov2: q $n \times c 2$ matrix with the value of the second margin covariates at which the quantiles should be computed.
mar: Only required when method="bayesian". If mar=TRUE then a first estimation of the margins is done.
par10, par20: Only required when method="bayesian". The vector of initial value for the parameters.
sig10, sig20: Only required when method="bayesian". Initial value for the standard deviations of the multivariate normal proposal distribution for both margins.
param0: Only required when method="bayesian". The vector of initial value for the Bernstein polynomial coefficients. It should be a list with elements $eta and $beta.
k0: Only required when method="bayesian". The initial value of the polynomial order.
pm0: Only required when method="bayesian". The list of initial values of the probability masses at the boundaries of the simplex. It should be a list with two elements $p0 and $p1, see bbeed.
prior.k: Only required when method="bayesian". The prior on the polynomial order, see bbeed.
prior.pm: Only required when method="bayesian". The prior on the probability masses at the endpoints of the simplex, see bbeed.
hyperparam: Only required when method="bayesian". A list of the hyper-parameters, see bbeed.
nsim: Only required when method="bayesian". Number of iterations in the Metropolis-Hastings algorithm.
lo: Only required when method="EdHK", "frequentist". Lower value of k in Hill estimator for shape parameter.
up: Only required when method="EdHK", "frequentist". Upper value of k in Hill estimator for shape parameter.
d: postive integer, indicating the order of Bernstein polynomials

Author

Simone Padoan, simone.padoan@unibocconi.it, https://mypage.unibocconi.it/simonepadoan/; Boris Beranger, borisberanger@gmail.com https://www.borisberanger.com/; Andrea Krajina, akrajina@gmail.com

Details

For some dataset given by data, the extreme-quantiles corresponding to some exceedance probability(ies) given in P are computed. The observations below the threshold U are considered censored.

If method="bayesian", the methodologies given by bbeed and UniExtQ are combined. The algorithm is a Trans-dimensional MCMC scheme as described in Algorithm 1 of Beranger et al. (2019). If mar=TRUE then the function UniExtQ is preliminarily applied to the margins to select some starting values updating par10, par20, sig10 and sig20. After running for nsim iteration the algorithm is paused and some diagnostics plots (on the margins and polynomial degree) are printed. The user then needs to enter the value of the burn-in period. See bbeed for details about the prior and hyperparameters for the dependence structure.
If method="EdHK", then the methodology developped by Einmahl et al. (2013) is applied. This is a non-parametric approach where the marginal parameters are estimated first.The Hill estimator is used to estimate the marginal tail indexes.
If method="frequentist", then similar to the "EdHK" estimator, the marginal parameters are estimated first (in the same way) and then the beed function is used to estimate the dependence structure.

References

Beranger, B., Padoan S. A. and Sisson, S. A. (2019). Estimation and uncertainty quantification for extreme quantile regions. arXiv e-prints arXiv:1904:08251.

Einmahl, J. H. J., de Haan, L. and Krajina, A. (2013). Estimating extreme bivariate quantile regions. Extremes, 16, 121-145.

Examples

Run this code

if (interactive()){
library(mvtnorm)

distribution <- "Cauchy"

par10 <- par20 <- c(1,2,1) # Initial marginal parameter values
sig10 <- sig20 <- 1 # Initial scale values in MVN proposal
prior.k <- "nbinom" # Prior distribution for polynomial degree
k0 <- 5 # Degree of the polynomial
prior.pm <- "unif" # Prior distribution for the point masses
pm0 <- list(p0=0, p1=0)
# Vector of hyperparameters for prior distribution:
hyperparam <- list(mu.nbinom = 3.2, var.nbinom = 4.48, a.unif = 0, b.unif = 0.1)

###
### Data simulation
###

n <- 1500 # Sample size
P <- c(1/750, 1/1500, 1/3000) # Vector of probabilities for extreme quantiles
prob <- c(0.9, 0.9) # To use to evaluate thresholds

# Dependence structure;
rho <- 0
sigma <- matrix(c(1,rho,rho,1), ncol=2)
df <- 1

# Compute quantiles for the Cauchy:
ell1 <- ellipse(prob=1-P[1], pos=TRUE)
ell2 <- ellipse(prob=1-P[2], pos=TRUE)
ell3 <- ellipse(prob=1-P[3], pos=TRUE)

realx1 <- ell1[,1]; realy1 <- ell1[,2]
realx2 <- ell2[,1]; realy2 <- ell2[,2]
realx3 <- ell3[,1]; realy3 <- ell3[,2]

# Data simulation (Cauchy)
set.seed(999)
data <- rmvt(5*n, sigma=sigma, df=df)
data <- data[data[,1]>0 & data[,2]>0, ]
data <- data[1:n, ]

# Threshold
U <- c(quantile(data[,1], probs = prob[1], type=3), quantile(data[,2], probs = prob[2], type=3))

###
### Estimation
###

Q <- ExtQset(data=data, P=P, U=U, par10=par10, par20=par20, sig10=sig10, sig20=sig20, pm0=pm0,
             k0=k0, prior.k=prior.k, prior.pm=prior.pm, hyperparam=hyperparam, nsim=50000)

Q.EDhK <- ExtQset(data=data, P=P, method="EDhK", lo=50, up=300)

w <- seq(0.00001, .99999, length=100) # define grid
gfun <- ((w^2+(1-w)^2)^(-3/2))^(1/3) # Compute the true g function
xT <- gfun*w # x-axis of Basic set
yT <- gfun*(1-w) # y-axis of Basic set

###
### Graphical representation
###

op <- par(mfrow=c(2,3), mar=c(3, 3, 0.5, 0.2), mgp=c(2,.8,0))

# Plot 1: Density of Exponent measure

ylim.pl1 <- c(0,1.7)
plot(w, gfun, type="l", xlab="w", ylab=expression(1/q[symbol("\052")](w)), ylim=ylim.pl1)
polygon(c(w, rev(w)), c(Q$ghat[3,], rev(Q$ghat[1,])), col="gray")
lines(w, Q$ghat[2,],col="gray0", lwd=2, lty=3)
lines(w, gfun, lwd=2)

# Plot 2: Basic-set S

xlim.pl2 <-c(0,1.5); ylim.pl2 <- c(0,1.5)
plot(xT,yT, pch=19, col=1, type="l", xlim=xlim.pl2, ylim=ylim.pl2,
	 xlab=expression(x[1]), ylab=expression(x[2]))
polygon(c(Q$Shat[,1,3], rev(Q$Shat[,1,1])), c(Q$Shat[,2,3], rev(Q$Shat[,2,1])), col="gray")
points(Q$Shat[,,2], type="l", col="gray0", lwd=2, lty=3)
lines(xT,yT,lwd=2)

# Plot 3: Data + quantile regions

xlim.pl3 <- c(0, 3500); ylim.pl3 <- c(0, 3500)
plot(data, xlim=xlim.pl3, ylim=ylim.pl3, pch=19, xlab=expression(x[1]), ylab=expression(x[2]))
points(realx1,realy1, type="l", lwd=2, lty=1)
points(realx2,realy2, type="l", lwd=2, lty=1)
points(realx3,realy3, type="l", lwd=2, lty=1)
lines(Q$Qset_P1_CovNum_1[,,2], lty=3, col="gray0", lwd=2)
lines(Q$Qset_P2_CovNum_1[,,2], lty=3, col="gray0", lwd=2)
lines(Q$Qset_P3_CovNum_1[,,2], lty=3, col="gray0", lwd=2)

# Plot 4,5,6: Quantile region with probability 1/750, 1/1500, 1/3000

xlim.pl46 <- c(0,7400); ylim.pl46 <- c(0,7400)
for(j in 1:3){
  tmp.name <- paste("Qset_P",j,"_CovNum_1",sep="")
  tmp.quant <- Q[[tmp.name]]

  plot(data, xlim=xlim.pl46, ylim=ylim.pl46, type="n", pch=19,
  	xlab=expression(x[1]), ylab=expression(x[2]))
  polygon(c(tmp.quant[,1,3], rev(tmp.quant[,1,1])),
  	c(tmp.quant[,2,3], rev(tmp.quant[,2,1])), col="gray")
  points(get(paste("realx",j,sep="")), get(paste("realy",j,sep="")), type="l", lty=1, lwd=2)
  lines(tmp.quant[,,2], lty=3, col="gray0", lwd=2)
  lines(Q.EDhK[[paste("xn_hat",j,sep="")]], Q.EDhK[[paste("yn_hat",j,sep="")]], lty=2, lwd=2)
}
par(op)
}

Run the code above in your browser using DataLab

Get 50% off unlimited learning