predMultiExpectiles: Multidimensional Extreme Expectile Estimation

Description

Computes point estimates and \((1-\alpha)100\%\) confidence regions for d-dimensional expectile at the extreme level (Expectile Prediction).

Usage

predMultiExpectiles(data, tau, tau1, method="LAWS", tailest="Hill", var=FALSE,
                    varType="asym-Ind-Adj-Log", bias=FALSE, k=NULL, alpha=0.05,
                    plot=FALSE)

Value

A list with elements:

ExpctHat: an estimate of the \(\tau'_n\)-th d-dimensional expecile;
biasTerm: an estimate of the bias term of yje \(\tau'_n\)-th d-dimensional expecile;
VarCovEHat: an estimate of the asymptotic variance-covariance of the d-dimensional expectile estimator;
EstConReg: an estimate of the approximate \((1-\alpha)100\%\) confidence regions for \(\tau'_n\)-th d-dimensional expecile.

Arguments

data: A matrix of \((n \times d)\) observations.
tau: A real in \((0,1)\) specifying the intermediate level \(\tau_n\). See Details.
tau1: A real in \((0,1)\) specifying the extreme level \(\tau'_n\). See Details.
method: A string specifying the method used to estimate the expecile. By default est="LAWS" specifies the use of the LAWS based estimator. See Details.
tailest: A string specifying the tail index estimator. By default tailest="Hill" specifies the use of Hill estimator. See Details.
var: If var=TRUE then an estimate of the asymptotic variance of the expectile estimator is computed.
varType: A string specifying the type of asymptotic variance-covariance matrix to compute. By default varType="asym-Ind-Adj-Log" specifies that the variance-covariance matrix is computed assuming dependent variables and exploiting the log scale and a suitable adjustment. See Details.
bias: A logical value. By default bias=FALSE specifies that no bias correction is computed. See Details.
k: An integer specifying the value of the intermediate sequence \(k_n\). See Details.
alpha: A real in \((0,1)\) specifying the confidence level \((1-\alpha)100\%\) of the approximate confidence region for the d-dimensional expecile at the extreme level.
plot: A logical value. By default plot=FALSE specifies that no graphical representation of the estimates is provided. See Details.

Author

Simone Padoan, simone.padoan@unibocconi.it, https://faculty.unibocconi.it/simonepadoan/; Gilles Stupfler, gilles.stupfler@univ-angers.fr, https://math.univ-angers.fr/~stupfler/

Details

For a dataset data of d-dimensional observations and sample size \(n\), an estimate of the \(\tau'_n\)-th d-dimensional expectile is computed. The estimation of the d-dimensional expectile at the extreme level tau1 (\(\tau'_n\)) is meant to be a prediction beyond the observed sample. Two estimators are available: the so-called Least Asymmetrically Weighted Squares (LAWS) based estimator and the Quantile-Based (QB) estimator. The definition of both estimators depends on the estimation of the d-dimensional tail index \(\gamma\). Here, \(\gamma\) is estimated using the Hill estimation (see MultiHTailIndex for details). The data are regarded as d-dimensional temporal independent observations coming from dependent variables. See Padoan and Stupfler (2020) for details.

The so-called intermediate level tau or \(\tau_n\) is a sequence of positive reals such that \(\tau_n \to 1\) as \(n \to \infty\). Practically, for each marginal distribution, \(\tau_n \in (0,1)\) is the ratio between N (Numerator) and D (Denominator). Where N is the empirical mean distance of the \(\tau_n\)-th expectile from the observations smaller than it, and D is the empirical mean distance of \(\tau_n\)-th expectile from all the observations.
The so-called extreme level tau1 or \(\tau'_n\) is a sequence of positive reals such that \(\tau'_n \to 1\) as \(n \to \infty\). For each marginal distribution, the value \((1-tau'_n) \in (0,1)\) is meant to be a small tail probability such that \((1-\tau'_n)=1/n\) or \((1-\tau'_n) < 1/n\). It is also assumed that \(n(1-\tau'_n) \to C\) as \(n \to \infty\), where \(C\) is a positive finite constant. Typically, \(C \in (0,1)\) so it is expected that there are no observations in a data sample that are greater than the expectile at the extreme level \(\tau_n'\).
When method='LAWS', then the \(\tau'_n\)-th d-dimensional expectile is estimated using the LAWS based estimator. When method='QB', the expectile is instead estimated using the QB esimtator. The definition of both estimators depend on the estimation of the d-dimensional tail index \(\gamma\). The d-dimensional tail index \(\gamma\) is estimated using the d-dimensional Hill estimator (tailest='Hill'), see MultiHTailIndex). This is the only available option so far (soon more results will be available). See Section 2.2 in Padoan and Stupfler (2020) for details.
If var=TRUE then an estimate of the asymptotic variance-covariance matrix of the \(tau'_n\)-th d-dimensional expectile is computed. Notice that the estimation of the asymptotic variance-covariance matrix is only available when \(\gamma\) is estimated using the Hill estimator (see MultiHTailIndex). The data are regarded as temporal independent observations coming from dependent variables. The asymptotic variance-covariance matrix is estimated exploiting the formulas in Section 3.2 of Padoan and Stupfler (2020). The variance-covariance matrix is computed exploiting the asymptotic behaviour of the normalized expectile estimator which is expressed in logarithmic scale. In addition, a suitable adjustment is considered. This is achieved through varType="asym-Ind-Adj-Log". The data can also be regarded as d-dimensional temporal independent observations coming from independent variables. In this case the asymptotic variance-covariance matrix is diagonal and is also computed exploiting the formulas in Section 3.2 of Padoan and Stupfler (2020). This is achieved through varType="asym-Ind-Log". If varType="asym-Ind-Adj", then the variance-covariance matrix is computed exploiting the asymptotic behaviour of the relative expectile estimator appropriately normalized and exploiting a suitable adjustment. This concerns the case of dependent variables. The case of independent variables is achieved through varType="asym-Ind".
If bias=TRUE then d-dimensional \(\gamma\) is estimated using formula (4.2) of Haan et al. (2016). This is used by the LAWS and QB estimators. Furthermore, the \(\tau'_n\)--th quantile is estimated using the formula in page 330 of de Haan et al. (2016). This provides a bias corrected version of the Weissman estimator. This is used by the QB estimator. However, in this case the asymptotic variance is not estimated using the formula in Haan et al. (2016) Theorem 4.2. Instead, for simplicity the asymptotic variance-covariance matrix is estimated by the formulas Section 3.2 of Padoan and Stupfler (2020).
k or \(k_n\) is the value of the so-called intermediate sequence \(k_n\), \(n=1,2,\ldots\). Its represents a sequence of positive integers such that \(k_n \to \infty\) and \(k_n/n \to 0\) as \(n \to \infty\). Practically, for each marginal distribution when tau=NULL and method='LAWS' or method='QB', then \(\tau_n=1-k_n/n\) is the intermediate level of the expectile to be stimated. When tailest='Hill', for each marginal distributions, then \(k_n\) specifies the number of k\(+1\) larger order statistics used in the definition of the Hill estimator.
Given a small value \(\alpha\in (0,1)\) then an estimate of an asymptotic confidence region for \(tau'_n\)-th d-dimensional expectile, with approximate nominal confidence level \((1-\alpha)100\%\), is computed. The confidence regions are computed exploiting the formulas in Section 3.2 of Padoan and Stupfler (2020). If varType="asym-Ind-Adj-Log", then an "asymmetric" confidence regions is computed exploiting the asymptotic behaviour of the normalized expectile estimator in logarithmic scale and using a suitable adjustment. This choice is recommended. If varType="asym-Ind-Adj", then the a "symmetric" confidence regions is computed exploiting the asymptotic behaviour of the relative explectile estimator appropriately normalized.
If plot=TRUE then a graphical representation of the estimates is not provided.

References

Simone A. Padoan and Gilles Stupfler (2022). Joint inference on extreme expectiles for multivariate heavy-tailed distributions, Bernoulli 28(2), 1021-1048.

Examples

Run this code

# Extreme expectile estimation at the extreme level tau1 obtained with
# d-dimensional observations simulated from a joint distribution with
# a Gumbel copula and equal Frechet marginal distributions.
library(plot3D)
library(copula)
library(evd)

# distributional setting
copula <- "Gumbel"
dist <- "Frechet"

# parameter setting
dep <- 3
dim <- 3
scale <- rep(1, dim)
shape <- rep(3, dim)
par <- list(dep=dep, scale=scale, shape=shape, dim=dim)

# Intermediate level (or sample tail probability 1-tau)
tau <- 0.95
# Extreme level (or tail probability 1-tau1 of unobserved expectile)
tau1 <- 0.9995

# sample size
ndata <- 1000

# Simulates a sample from a multivariate distribution with equal Frechet
# marginals distributions and a Gumbel copula
data <- rmdata(ndata, dist, copula, par)
scatter3D(data[,1], data[,2], data[,3])

# High d-dimensional expectile (intermediate level) estimation
expectHat <- predMultiExpectiles(data, tau, tau1, var=TRUE)

expectHat$ExpctHat
expectHat$VarCovEHat
# run the following command to see the graphical representation
# \donttest{
 expectHat <- predMultiExpectiles(data, tau, tau1, var=TRUE, plot=TRUE)
# }

Run the code above in your browser using DataLab