GHcop: The Gumbel-Hougaard Extreme Copula

Description

SYMMETRIC GUMBEL-HOUGAARD---The Gumbel-Hougaard copula (Nelsen, 2006, pp. 118 and 164) is $$\mathbf{C}_{\Theta}(u,v) = \mathbf{GH}(u,v) = \mathrm{exp}{-[(-\log u)^\Theta+(-\log v)^\Theta]^{1/\Theta}}\mbox{,}$$ where $\Theta \in [1 , \infty)$. The copula here is a bivariate extreme value copula ($BEV$). The parameter $\Theta$ is readily estimated using Kendall's Tau (say a sample version $\hat\tau$) where the $\tau$ of the copula ($\tau_\mathbf{C}$) is defined as $$\tau_\mathbf{C} = \frac{\Theta - 1}{\Theta} \rightarrow \Theta = \frac{1}{1-\tau}\mbox{.}$$ The copula is readily extended into $d$ dimensions by $$\mathbf{C}_{\Theta}(u,v) = \mathrm{exp}{-[(-\log u_1)^\Theta+\cdots+(-\log u_d)^\Theta]^{1/\Theta}}\mbox{.}$$ However, such an implementation is not available in the copBasic package.

Every Gumbel-Hougaard copula is a multivariate extreme value ($MEV$) copula, and hence useful in analysis of extreme value distributions. The Gumbel-Hougaard copula is the only Archimedean $MEV$ (Salvadori et al., 2007, p. 192). The Gumbel-Hougaard copula has respective lower- and upper-tail dependency parameters of $\lambda^L = 0$ and $\lambda^U = 2 - 2^{1/\Theta}$, respectively. Nelsen (2006, p. 96) shows that $\mathbf{C}^r_\theta(u^{1/r}, v^{1/r}) = \mathbf{C}_\theta(u,v)$ so that every Gumbel-Hougaard copula is max-stable. A dependence measure uniquely defined for $BEV$ copulas is shown under rhobevCOP.

A comparison through simulation between Gumbel-Hougaard implementations by the Rpackages acopula, copBasic, copula, and Gumbel is shown in the Examples section. At least three divergent techniques for random variate generation are used amongst those packages. The simulations also use copBasic-style random variate generation (conditional simulation) using an analytical-numerical hybrid solution to conditional inverse described in the Note section.

ASYMMETRIC GUMBEL-HOUGAARD---An asymmetric version of the copula is readily constructed (Joe, 2014, p. 185--186) into a three-parameter version with Marshall-Olkin copulas on the boundaries: $$\mathbf{C}(u,v; \Theta, \pi_2, \pi_3) = \mathrm{exp}[-\mathcal{A}(-\log u, -\log v; \Theta, \pi_2, \pi_3)]\mbox{,}$$ where $\Theta \ge 1$ as before, $0 \le \pi_2, \pi_3 \le 1$, and

$$\mathcal{A}(x, y; \Theta, \pi_2, \pi_3) = [(\pi_2 x)^\Theta + (\pi_3 y)^\Theta]^{1/\Theta} + (1-\pi_2)x + (1-\pi_3)y\mbox{.}$$

The asymmetric $\mathbf{GH}$ is triggered if the length of the para argument is exactly 3. The GHcop function provides no mechanism for estimation of the parameters for the asymmetric version. Reviewing simulations, the bounds on the $\pi$s in Joe (2014, p. 185) [$0 \le \pi_2 < \pi_3 \le 1$] might be incorrect---by Joe's own back reference to Joe (2014, eq. 4.35, p. 183) the $\pi$-limits as stated for copBasic are shown. An algorithm for parameter estimation for the asymmetric $\mathbf{GH}$ using two different measures of bivariate skewness as well as an arbitrary measure of association is shown in section Details in joeskewCOP.

Usage

GHcop(u, v, para=NULL, tau=NULL, tau.big=0.985, cor=NULL, ...)

Arguments

Nonexceedance probability $u$ in the $X$ direction;

Nonexceedance probability $v$ in the $Y$ direction;

para

A vector (single element or triplet) of parameters---the $\Theta$ parameter of the copula;

tau

Kendall's Tau $\tau$ from which to estimate the parameter $\Theta$;

tau.big

The largest value for $\tau_\mathbf{C}$ prior to switching to the $\mathbf{M}$ copula applicable to the the symmetric version of this copula;

cor

A copBasic syntax for the correlation coefficient suitable for the copula---a synonym for tau; and

...

Additional arguments to pass.

Value

Value(s) for the copula are returned using the $\Theta$ as set by argument para. Alternative returned values are possible: (1) If para=NULL and tau is set, then $\tau_\mathbf{C} \rightarrow \Theta$ and an Rlist is returned. (2) If para=NULL and tau=NULL, then an attempt to estimate $\Theta$ from the u and v is made by $\mathrm{cor}(u,v)_\tau \rightarrow \tau_\mathbf{C} \rightarrow \Theta$ by either trigger using cor(u,v, method="kendall") in Rand an Rlist is returned. The possibly returned list has the following elements:
paraThe computed $\Theta$ from the given bivariate data in para and
tauThe sample estimate of $\tau$.

encoding

utf8

concept

symmetric Gumbel-Hougaard copula
symmetric Gumbel copula
asymmetric Gumbel copula
asymmetric logistic copula
Gumbel-Hougaard extreme value copula

Details

Numerical experiments seem to indicate that for $\tau_\mathbf{C} > 0.985$ that failures in the numerical partial derivatives in derCOP and derCOP2 result---a $\tau_\mathbf{C}$ this large is indeed large. As $\Theta \rightarrow \infty$ the Gumbel-Hougaard copula becomes the Fréchet{Frechet}-Hoeffding upper bound copula $\mathbf{M}$ (see M). A $\tau_\mathbf{C} \approx 0.985$ yields $\Theta \approx 66 + 2/3$, then for $\Theta > 1/(1-\tau_\mathbf{C})$ flips over to the $\mathbf{M}$ copula with a warning being issued.

References

Asquith, W.H., 2011, Distributional analysis with L-moment statistics using the R environment for statistical computing: Createspace Independent Publishing Platform, ISBN 978--146350841--8.

Joe, H., 2014, Dependence modeling with copulas: Boca Raton, CRC Press, 462 p.

Nelsen, R.B., 2006, An introduction to copulas: New York, Springer, 269 p.

Salvadori, G., De Michele, C., Kottegoda, N.T., and Rosso, R., 2007, Extremes in Nature---An approach using copulas: Springer, 289 p.

Zhang, L., and Singh, V.P., 2007, Gumbel-Hougaard copula for trivariate rainfall frequency analysis: Journal Hydrologic Engineering, v. 12, Special issue---Copulas in Hydrology, pp. 409--419.

Examples

Run this code

Theta <- 2.2 # Lets see if the numerical and analytical tail deps are the same.
del.lamU <- abs(taildepCOP(cop=GHcop, para=Theta)$lambdaU - (2-2^(1/Theta)))
as.logical(del.lamU < 1E-6) # TRUE
# The simulations match Joe (2014, p. 72) for Gumbel-Hougaard
n <- 600; nsim <- 1000; set.seed(946) # see for reproducibility
SM <- sapply(1:nsim, function(i) { rs <- semicorCOP(cop=GHcop, para=1.35, n=n)
                                   c(rs$minus.semicor, rs$plus.semicor) })
RhoM     <- round(mean(SM[1,]),        digits=3)
RhoP     <- round(mean(SM[2,]),        digits=3)
SE.RhoM  <- round(sd(SM[1,]),          digits=3)
SE.RhoP  <- round(sd(SM[2,]),          digits=3)
SE.RhoMP <- round(sd(SM[2,] - SM[1,]), digits=3)
# Semi-correlations (sRho) and standard errors (SEs)
message("# sRho[-]=", RhoM, " (SE[-]=", SE.RhoM, ") Joe's=0.132 (SE[-]=0.08)")
message("# sRho[+]=", RhoP, " (SE[+]=", SE.RhoP, ") Joe's=0.415 (SE[+]=0.07)")
message("# SE(sRho[-] - sRho[+])=", SE.RhoMP, " Joe's SE=0.10")
# sRho[-]=0.134 (SE[-]=0.076) Joe's=0.132 (SE[-]=0.08)
# sRho[+]=0.407 (SE[+]=0.074) Joe's=0.415 (SE[+]=0.07)
# SE(sRho[-] - sRho[+])=0.107 Joe's SE=0.10
# Joe (2014, p. 72) reports the values 0.132, 0.415, 0.08, 0.07, 0.10, respectively.

file <- "Lcomom_study_of_GHcopPLACKETTcop.txt"
x <- data.frame(tau=NA, trho=NA, srho=NA, GHtheta=NA, PLtheta=NA,
                GHT2=NA, GHT3=NA, GHT4=NA,
                PLT2=NA, PLT3=NA, PLT4=NA)
write.table(x, file=file, row.names=FALSE, quote=FALSE)
n <- 25000 # Make a large number, long CPU run but seems stable
for(tau in seq(0,0.98, by=0.005)) {
   thetag <- GHcop(u=NULL, v=NULL, tau=tau)$para
   trho <- rhoCOP(cop=GHcop, para=thetag)
   GH <- simCOP(n=n, cop=GHcop, para=thetag, points=FALSE, ploton=FALSE)
   srho <- cor(GH$U, GH$V, method="spearman")
   thetap <- PLACKETTpar(rho=trho)
   PL <- simCOP(n=n, cop=PLACKETTcop, para=thetap, points=FALSE, ploton=FALSE)
   GHl <- lmomco::lcomoms2(GH, nmom=4); PLl <- lmomco::lcomoms2(PL, nmom=4)
   x <- data.frame(tau=tau, trho=trho, srho=srho,
                   GHtheta=thetag, PLtheta=thetap,
                   GHT2=mean(c(GHl$T2[1,2], GHl$T2[2,1])),
                   GHT3=mean(c(GHl$T3[1,2], GHl$T3[2,1])),
                   GHT4=mean(c(GHl$T4[1,2], GHl$T4[2,1])),
                   PLT2=mean(c(PLl$T2[1,2], PLl$T2[2,1])),
                   PLT3=mean(c(PLl$T3[1,2], PLl$T3[2,1])),
                   PLT4=mean(c(PLl$T4[1,2], PLl$T4[2,1])) )
   write.table(x, file=file, row.names=FALSE, col.names=FALSE, append=TRUE)
}

D <- read.table(file, header=TRUE); D <- D[complete.cases(D),]
plot(D$tau, D$GHT3, ylim=c(-0.08,0.08), type="n",
     xlab="KENDALL TAU", ylab="L-COSKEW OR NEGATED L-COKURTOSIS")
points(D$tau,  D$GHT3, col=2);             points(D$tau,  D$PLT3, col=1)
points(D$tau, -D$GHT4, col=4, pch=2);      points(D$tau, -D$PLT4, col=1, pch=2)
LM3 <- lm(D$GHT3~I(D$tau^1)+I(D$tau^2)+I(D$tau^4)-1)
LM4 <- lm(D$GHT4~I(D$tau^1)+I(D$tau^2)+I(D$tau^4)-1)
LM3c <- LM3$coe; LM4c <- LM4$coe
Tau <- seq(0,1, by=.01)
abline(0,0, lty=2, col=3)
lines(Tau,   0 + LM3c[1]*Tau^1 + LM3c[2]*Tau^2 + LM3c[3]*Tau^4,  col=4, lwd=3)
lines(Tau, -(0 + LM4c[1]*Tau^1 + LM4c[2]*Tau^2 + LM4c[3]*Tau^4), col=2, lwd=3)

# Let us compare the conditional simulation method of copBasic by numerics and by the
# above analytical solution for the Gumbel-Hougaard copula to two methods implemented
# by package gumbel, a presumed Archimedean technique by package acopula, and an
# Archimedean technique by package copula. Setting seeds by each "method" below does
# not appear diagnostic because of the differences in which the simulations are made.
nsim <- 10000; kn <- "kendall" #  The theoretical KENDALL TAU is (1.5-1)/1.5 = 1/3
# Simulate by conditional simulation using numerical derivative and then inversion
A <- cor(copBasic::simCOP(nsim, cop=GHcop, para=1.5, graphics=FALSE), method=kn)[1,2]
U <- runif(nsim)
V <- sapply(1:nsim, function(i) { GHcop.derCOPinv(U[i], runif(1), para=1.5) })
# Simulate by conditional simulation using exact analytical solution
B <- cor(U, y=V, method=kn);  rm(U, V)
# Simulate by the "common frailty" technique
C <- cor(gumbel::rgumbel(nsim, 1.5, dim=2, method=1), method=kn)[1,2]
# Simulate by "K function" (Is the K function method, Archimedean?)
D <- cor(gumbel::rgumbel(nsim, 1.5, dim=2, method=2), method=kn)[1,2]
# Simulate by an Archimedean implementation (presumably)
E <- cor(acopula::rCopula(nsim, pars=1.5), method=kn)[1,2]
# Simulate by an Archimedean implementation
G <- cor(copula::rCopula(nsim, copula::gumbelCopula(1.5)), method=kn)[1,2]
K <- round(c(A, B, C, D, E, G), digits=5); rm(A, B, C, D, E, G, kn); tx <- ", "
message("Kendall's Tau: ", K[1], tx, K[2], tx, K[3], tx, K[4], tx, K[5], tx, K[6])
# Kendall's Tau: 0.32909, 0.32474, 0.33060, 0.32805, 0.32874, 0.33986
# Kendall's Tau: 0.33357, 0.32748, 0.33563, 0.32913, 0.32732, 0.32416
# Kendall's Tau: 0.34311, 0.33415, 0.33815, 0.33224, 0.32961, 0.33008
# Kendall's Tau: 0.32830, 0.33573, 0.32756, 0.33401, 0.33567, 0.33182 -- nsim=50000!
# All solutions are near 1/3 and it is unknown without further study which of the
# six methods would result in the least bias and (or) sampling variability.

Run the code above in your browser using DataLab