GHcop: The Gumbel--Hougaard Extreme Value Copula

Description

SYMMETRIC GUMBEL-HOUGAARD---The Gumbel--Hougaard copula (Nelsen, 2006, pp. 118 and 164) is $$\mathbf{C}_{\Theta}(u,v) = \mathbf{GH}(u,v) = \mathrm{exp}\{-[(-\log u)^\Theta+(-\log v)^\Theta]^{1/\Theta}\}\mbox{,}$$ where $\Theta \in [1 , \infty)$. The copula here is a bivariate extreme value copula ($BEV$). The parameter $\Theta$ is readily estimated using a Kendall Tau (say a sample version $\hat\tau$) where the $\tau$ of the copula ($\tau_\mathbf{C}$) is defined as $$\tau_\mathbf{C} = \frac{\Theta - 1}{\Theta} \rightarrow \Theta = \frac{1}{1-\tau}\mbox{.}$$ The copula is readily extended into $d$ dimensions by $$\mathbf{C}_{\Theta}(u,v) = \mathrm{exp}\{-[(-\log u_1)^\Theta+\cdots+(-\log u_d)^\Theta]^{1/\Theta}\}\mbox{.}$$ However, such an implementation is not available in the copBasic package.

Every Gumbel--Hougaard copula is a multivariate extreme value ($MEV$) copula, and hence useful in analysis of extreme value distributions. The Gumbel--Hougaard copula is the only Archimedean $MEV$ (Salvadori et al., 2007, p. 192). The Gumbel--Hougaard copula has respective lower- and upper-tail dependency parameters of $\lambda^L = 0$ and $\lambda^U = 2 - 2^{1/\Theta}$, respectively. Nelsen (2006, p. 96) shows that $\mathbf{C}^r_\theta(u^{1/r}, v^{1/r}) = \mathbf{C}_\theta(u,v)$ so that every Gumbel--Hougaard copula is has a property known as max-stable. A dependence measure uniquely defined for $BEV$ copulas is shown under rhobevCOP.

A comparison through simulation between Gumbel--Hougaard implementations by the R packages acopula, copBasic, copula, and Gumbel is shown in the Examples section. At least three divergent techniques for random variate generation are used amongst those packages. The simulations also use copBasic-style random variate generation (conditional simulation) using an analytical-numerical hybrid solution to conditional inverse described in the Note section.

TWO-PARAMETER GUMBEL--HOUGAARD---A permutation symmetric (isCOP.permsym) but almost certainly radial asymmetric (isCOP.radsym) version of the copula is readily constructed (Brahimi et al., 2015) into a two-parameter version: $$\mathbf{C}(u,v; \beta_1, \beta_2) = \biggl[ \biggl(\bigl(u^{-\beta_2} -1\bigr)^{\beta_1} + \bigl(v^{-\beta_2} -1\bigr)^{\beta_1} \biggr)^{1/\beta_1} + 1 \biggr]^{-1/\beta_2}\mbox{,}$$ where $\beta_1 \ge 1$ and $\beta_2 > 0$. Both parameters controls the general level of association, whereas parameter $\beta_2$ can be thought of as controling left-tail dependency (taildepCOP, $\lambda^{[U\mid L]}_{(\beta_1, \beta_2)}$; e.g. $\lambda^U_{(1.5; \beta_2)} = 0.413$ for all $\beta_2$ but $\lambda^L_{(1.5; 0.2)} = 0.811$ and $\lambda^L_{(1.5; 2.2)} = 0.099$. Brahimi et al. (2015) report a Spearman Rho (rhoCOP) for a $\mathbf{GH}_{(1.5, 0.2)}(u,v)$ is 0.5, which is readily confirmed in copBasic by the function call rhoCOP(cop=GHcop, para=c(1.5,0.2)). The two-parameter $\mathbf{GH}$ is triggered if the length of the para argument is exactly 2.

ASYMMETRIC GUMBEL--HOUGAARD---An asymmetric version of the copula is readily constructed (Joe, 2014, p. 185--186) into a three-parameter version with Marshall--Olkin copulas on the boundaries: $$\mathbf{C}(u,v; \Theta, \pi_2, \pi_3) = \mathrm{exp}[-\mathcal{A}(-\log u, -\log v; \Theta, \pi_2, \pi_3)]\mbox{,}$$ where $\Theta \ge 1$ as before, $0 \le \pi_2, \pi_3 \le 1$, and

$$\mathcal{A}(x, y; \Theta, \pi_2, \pi_3) = [(\pi_2 x)^\Theta + (\pi_3 y)^\Theta]^{1/\Theta} + (1-\pi_2)x + (1-\pi_3)y\mbox{.}$$

The asymmetric $\mathbf{GH}$ is triggered if the length of the para argument is exactly 3. The GHcop function provides no mechanism for estimation of the parameters for the asymmetric version. Reviewing simulations, the bounds on the $\pi$ parameters in Joe (2014, p. 185) “[$0 \le \pi_2 < \pi_3 \le 1$]” might be incorrect---by Joe's own back reference to Joe (2014, eq. 4.35, p. 183) the $\pi$-limits as stated for copBasic are shown. An algorithm for parameter estimation for the asymmetric $\mathbf{GH}$ using two different measures of bivariate skewness as well as an arbitrary measure of association is shown in section Details in joeskewCOP.

Usage

GHcop(u, v, para=NULL, tau=NULL, tau.big=0.985, cor=NULL, ...)

Arguments

Nonexceedance probability $u$ in the $X$ direction;

Nonexceedance probability $v$ in the $Y$ direction;

para

A vector (single element or triplet) of parameters---the $\Theta$ parameter of the copula;

tau

Kendall Tau $\tau$ from which to estimate the parameter $\Theta$;

tau.big

The largest value for $\tau_\mathbf{C}$ prior to switching to the $\mathbf{M}$ copula applicable to the the symmetric version of this copula;

cor

A copBasic syntax for “the correlation coefficient” suitable for the copula---a synonym for tau; and

...

Additional arguments to pass.

Value

Value(s) for the copula are returned using the $\Theta$ as set by argument para. Alternative returned values are possible: (1) If para=NULL and tau is set, then $\tau_\mathbf{C} \rightarrow \Theta$ and an R list is returned. (2) If para=NULL and tau=NULL, then an attempt to estimate $\Theta$ from the u and v is made by $\mathrm{cor}(u,v)_\tau \rightarrow \tau_\mathbf{C} \rightarrow \Theta$ by either trigger using cor(u,v, method="kendall") in R, and an R list is returned. The possibly returned list has the following elements:

para

The computed $\Theta$ from the given bivariate data in para; and

tau

The sample estimate of $\tau$.

Details

Numerical experiments seem to indicate for $\tau_\mathbf{C} > 0.985$ that failures in the numerical partial derivatives in derCOP and derCOP2 result---a $\tau_\mathbf{C}$ this large is indeed large. As $\Theta \rightarrow \infty$ the Gumbel--Hougaard copula becomes the Fr<U+00E9>chet--Hoeffding upper bound copula $\mathbf{M}$ (see M). A $\tau_\mathbf{C} \approx 0.985$ yields $\Theta \approx 66 + 2/3$, then for $\Theta > 1/(1-\tau_\mathbf{C})$ flips over to the $\mathbf{M}$ copula with a warning issued.

References

Asquith, W.H., 2011, Distributional analysis with L-moment statistics using the R environment for statistical computing: Createspace Independent Publishing Platform, ISBN 978--146350841--8.

Brahimi, B., Chebana, F., and Necir, A., 2015, Copula representation of bivariate L-moments---A new estimation method for multiparameter two-dimensional copula models: Statistics, v. 49, no. 3, pp. 497--521.

Joe, H., 2014, Dependence modeling with copulas: Boca Raton, CRC Press, 462 p.

Nelsen, R.B., 2006, An introduction to copulas: New York, Springer, 269 p.

Salvadori, G., De Michele, C., Kottegoda, N.T., and Rosso, R., 2007, Extremes in Nature---An approach using copulas: Springer, 289 p.

Zhang, L., and Singh, V.P., 2007, Gumbel--Hougaard copula for trivariate rainfall frequency analysis: Journal Hydrologic Engineering, v. 12, Special issue---Copulas in Hydrology, pp. 409--419.

Examples

Run this code

# NOT RUN {
Theta <- 2.2 # Lets see if the numerical and analytical tail deps are the same.
del.lamU <- abs(taildepCOP(cop=GHcop, para=Theta)$lambdaU - (2-2^(1/Theta)))
as.logical(del.lamU < 1E-6) # TRUE
# }
# NOT RUN {
# The simulations match Joe (2014, p. 72) for Gumbel-Hougaard
n <- 600; nsim <- 1000; set.seed(946) # see for reproducibility
SM <- sapply(1:nsim, function(i) { rs <- semicorCOP(cop=GHcop, para=1.35, n=n)
                                   c(rs$minus.semicor, rs$plus.semicor) })
RhoM     <- round(mean(SM[1,]),        digits=3)
RhoP     <- round(mean(SM[2,]),        digits=3)
SE.RhoM  <- round(sd(SM[1,]),          digits=3)
SE.RhoP  <- round(sd(SM[2,]),          digits=3)
SE.RhoMP <- round(sd(SM[2,] - SM[1,]), digits=3)
# Semi-correlations (sRho) and standard errors (SEs)
message("# sRho[-]=", RhoM, " (SE[-]=", SE.RhoM, ") Joe's=0.132 (SE[-]=0.08)")
message("# sRho[+]=", RhoP, " (SE[+]=", SE.RhoP, ") Joe's=0.415 (SE[+]=0.07)")
message("# SE(sRho[-] - sRho[+])=", SE.RhoMP, " Joe's SE=0.10")
# sRho[-]=0.134 (SE[-]=0.076) Joe's=0.132 (SE[-]=0.08)
# sRho[+]=0.407 (SE[+]=0.074) Joe's=0.415 (SE[+]=0.07)
# SE(sRho[-] - sRho[+])=0.107 Joe's SE=0.10
# Joe (2014, p. 72) reports the values 0.132, 0.415, 0.08, 0.07, 0.10, respectively.
# }
# NOT RUN {
# }
# NOT RUN {
file <- "Lcomom_study_of_GHcopPLACKETTcop.txt"
x <- data.frame(tau=NA, trho=NA, srho=NA, PLtheta=NA, PLT2=NA, PLT3=NA, PLT4=NA,
                                          GHtheta=NA, GHT2=NA, GHT3=NA, GHT4=NA )
write.table(x, file=file, row.names=FALSE, quote=FALSE)
n <- 250 # Make a large number for very long CPU run but seems stable
for(tau in seq(0,0.98, by=0.005)) {
   thetag <- GHcop(u=NULL, v=NULL, tau=tau)$para
   trho <- rhoCOP(cop=GHcop, para=thetag)
   GH <- simCOP(n=n, cop=GHcop, para=thetag, points=FALSE, ploton=FALSE)
   srho <- cor(GH$U, GH$V, method="spearman")
   thetap <- PLACKETTpar(rho=trho)
   PL <- simCOP(n=n, cop=PLACKETTcop, para=thetap, points=FALSE, ploton=FALSE)
   GHl <- lmomco::lcomoms2(GH, nmom=4); PLl <- lmomco::lcomoms2(PL, nmom=4)
   x <- data.frame(tau=tau, trho=trho, srho=srho,
                   GHtheta=thetag, PLtheta=thetap,
                   GHT2=mean(c(GHl$T2[1,2], GHl$T2[2,1])),
                   GHT3=mean(c(GHl$T3[1,2], GHl$T3[2,1])),
                   GHT4=mean(c(GHl$T4[1,2], GHl$T4[2,1])),
                   PLT2=mean(c(PLl$T2[1,2], PLl$T2[2,1])),
                   PLT3=mean(c(PLl$T3[1,2], PLl$T3[2,1])),
                   PLT4=mean(c(PLl$T4[1,2], PLl$T4[2,1])) )
   write.table(x, file=file, row.names=FALSE, col.names=FALSE, append=TRUE)
}

# After a processing run with very large "n", then meaningful results exist.
D <- read.table(file, header=TRUE); D <- D[complete.cases(D),]
plot(D$tau, D$GHT3, ylim=c(-0.08,0.08), type="n",
     xlab="KENDALL TAU", ylab="L-COSKEW OR NEGATED L-COKURTOSIS")
points(D$tau,  D$GHT3, col=2);             points(D$tau,  D$PLT3, col=1)
points(D$tau, -D$GHT4, col=4, pch=2);      points(D$tau, -D$PLT4, col=1, pch=2)
LM3 <- lm(D$GHT3~I(D$tau^1)+I(D$tau^2)+I(D$tau^4)-1)
LM4 <- lm(D$GHT4~I(D$tau^1)+I(D$tau^2)+I(D$tau^4)-1)
LM3c <- LM3$coe; LM4c <- LM4$coe
Tau <- seq(0,1, by=.01); abline(0,0, lty=2, col=3)
lines(Tau,   0 + LM3c[1]*Tau^1 + LM3c[2]*Tau^2 + LM3c[3]*Tau^4,  col=4, lwd=3)
lines(Tau, -(0 + LM4c[1]*Tau^1 + LM4c[2]*Tau^2 + LM4c[3]*Tau^4), col=2, lwd=3) #
# }
# NOT RUN {
# }
# NOT RUN {
# Let us compare the conditional simulation method of copBasic by numerics and by the
# above analytical solution for the Gumbel-Hougaard copula to two methods implemented
# by package gumbel, a presumed Archimedean technique by package acopula, and an
# Archimedean technique by package copula. Setting seeds by each "method" below does
# not appear diagnostic because of the differences in which the simulations are made.
nsim <- 10000; kn <- "kendall" #  The theoretical KENDALL TAU is (1.5-1)/1.5 = 1/3
# Simulate by conditional simulation using numerical derivative and then inversion
A <- cor(copBasic::simCOP(nsim, cop=GHcop, para=1.5, graphics=FALSE), method=kn)[1,2]
U <- runif(nsim) # GHcop.derCOPinv() comes from earlier in this documentation.
V <- sapply(1:nsim, function(i) { GHcop.derCOPinv(U[i], runif(1), para=1.5) })
# Simulate by conditional simulation using exact analytical solution
B <- cor(U, y=V, method=kn);  rm(U, V)
# Simulate by the "common frailty" technique
C <- cor(gumbel::rgumbel(nsim, 1.5, dim=2, method=1), method=kn)[1,2]
# Simulate by "K function" (Is the K function method, Archimedean?)
D <- cor(gumbel::rgumbel(nsim, 1.5, dim=2, method=2), method=kn)[1,2]
# Simulate by an Archimedean implementation (presumably)
E <- cor(acopula::rCopula(nsim, pars=1.5), method=kn)[1,2]
# Simulate by an Archimedean implementation
G <- cor(copula::rCopula(nsim, copula::gumbelCopula(1.5)), method=kn)[1,2]
K <- round(c(A, B, C, D, E, G), digits=5); rm(A, B, C, D, E, G, kn); tx <- ", "
message("Kendall's Tau: ", K[1], tx, K[2], tx, K[3], tx, K[4], tx, K[5], tx, K[6])
# Kendall's Tau: 0.32909, 0.32474, 0.33060, 0.32805, 0.32874, 0.33986 -- run 1
# Kendall's Tau: 0.33357, 0.32748, 0.33563, 0.32913, 0.32732, 0.32416 -- run 2
# Kendall's Tau: 0.34311, 0.33415, 0.33815, 0.33224, 0.32961, 0.33008 -- run 3
# Kendall's Tau: 0.32830, 0.33573, 0.32756, 0.33401, 0.33567, 0.33182 -- nsim=50000!
# All solutions are near 1/3 and it is unknown without further study which of the
# six methods would result in the least bias and (or) sampling variability.
# }

Run the code above in your browser using DataLab