EMPIRcop: The Bivariate Empirical Copula

Description

The bivariate empirical copula (Nelsen, 2006, p. 219) for a bivariate sample of length $n$ is defined for random variables $X$ and $Y$ as

$$\mathbf{C}_n\biggl(\frac{i}{n}, \frac{j}{n}\biggr) = \frac{\mathrm{number\ of\ pairs\ (}x,y\mathrm{)\ with\ }x \le x_{(i)}\mathrm{\ and\ }y \le y_{(j)}}{n}\mbox{,}$$

where $x_{(i)}$ and $y_{(i)}$, $1 \le i,j \le n$ or expressed as $$\mathbf{C}_n\biggl(\frac{i}{n}, \frac{j}{n}\biggr) = \frac{1}{n}\sum_{i=1}^n \mathbf{1}\biggl(\frac{R_i}{n} \le u_i, \frac{S_i}{n} \le v_i \biggr)\mbox{,}$$ where $R_i$ and $S_i$ are ranks of the data for $U$ and $V$, and $\mathbf{1}(.)$ is an indicator function scoring 1 if condition is true otherwise zero. Using more generic notation, the empirical copula can be defined by $$\mathbf{C}_{n}(u,v) = \frac{1}{n}\sum_{i=1}^n \mathbf{1}\bigl(u^\mathrm{obs}_{i} \le u_i, v^\mathrm{obs}_{i} \le v_i \bigr)\mbox{,}$$ where $u^\mathrm{obs}$ and $v^\mathrm{obs}$ are thus sometype of nonparametric nonexceedance probabilities based on counts of the underlying data expressed in probabilities.

Hazen Empirical Copula---The Hazen form of the empirical copula is $$\mathbf{C}^\mathcal{H}_{n}(u,v) = \frac{1}{n}\sum_{i=1}^n \mathbf{1}\biggl(\frac{R_i - 0.5}{n} \le u_i, \frac{S_i - 0.5}{n} \le v_i \biggr)\mbox{,}$$ which can be triggered by ctype="hazen". This form is named for this package because of directly similarity of the Hazen plotting position to the definition. Joe (2014, pp. 247--248) uses this form. Joe continues by saying [the] adjustment of the uniform score [$(R - 0.5/n]$] could be done in an alternative form, but there is [asymptotic] equivalence[, and that] $\mathbf{C}^\mathcal{H}_{n}$ puts mass of $n^{-1}$ at the tuples $([r_{i1} - 0.5/n, \ldots, [r_{id} - 0.5]/n)$ for $i = 1, \ldots, n$. A Joe footnote says that the conversion [$R/(n+1)$] is commonly used for the empirical copula. This later form is the Weibull form described next. Joe's preference for the Hazen form is so that the sum of squared normal scores is closer to unity for large $n$ than attained by the Weibull form.

Weibull Empirical Copula---The Weibull form of the empirical copula is $$\mathbf{C}^\mathcal{W}_{n}(u,v) = \frac{1}{n}\sum_{i=1}^n \mathbf{1}\biggl(\frac{R_i}{n+1} \le u_i, \frac{S_i}{n+1} \le v_i \biggr)\mbox{,}$$ which can be triggered by ctype="weibull". This form is named for this package because of directly similarity of the Hazen plotting position to the definition, and this form is the default (see argument description).

Bernstein Empirical Copula---The empirical copula can be extended nonparametrically as the Bernstein empirical copula (Hernández-Maldonado{Hernandez-Maldonado}, Díaz-Viera{Diaz-Viera}, and Erdely, 2012) and is formulated as

$$\mathbf{C}^\mathcal{B}_n(u,v; \eta) = \sum_{i=1}^n\sum_{j=1}^n \mathbf{C}_{n}\biggl(\frac{i}{n},\frac{j}{n}\biggr) \times \eta(i,j; u,v)\mbox{,}$$ where the individual Bernstein weights $\eta(i,j)$ for the $k$th paired value of the $u$ and $v$ vectors are

$$\eta(i,j; u,v) = {n \choose i} u^i (1-u)^{n-i} \times {n \choose j} u^j (1-u)^{n-j}\mbox{.}$$

The Bernstein extension, albeit conceptually pure in its shuffling by binomial coefficients and left- and right-tail weightings, is CPU intensive as inspection of the equations above seemingly indicates a nest of four for() loops in R. (The native Rcode of copBasic uses the sapply() function in Rliberally for substantial but not blazing fast speed increase.) The Bernstein extension results in a somewhat smoother surface of the empirical copula, and lastly which can be triggered by ctype="bernstein".

The empirical copula frequency can be defined (Nelson, 2006, p. 219) as $$\mathbf{c}_n(u, v) = \mathbf{C}_n\biggl(\frac{i}{n}, \frac{j}{n}\biggr) - \mathbf{C}_n\biggl(\frac{i-1}{n}, \frac{j}{n}\biggr) - \mathbf{C}_n\biggl(\frac{i}{n}, \frac{j-1}{n}\biggr) + \mathbf{C}_n\biggl(\frac{i-i}{n}, \frac{j-1}{n}\biggr)\mbox{.}$$

Usage

EMPIRcop(u, v, para=NULL, ctype=c("weibull", "hazen", "1/n", "bernstein"),
                          bernprogress=FALSE, ...)

Arguments

Nonexceedance probability $u$ in the $X$ direction;

Nonexceedance probability $v$ in the $Y$ direction;

para

A vector (single element) of parameters---the U-statistics of the data (see Examples). Alternatively, para can be a list holding a para as would be done if it were a vector, but arguments bernstein

ctype

An alternative means for trigging the definition of $\mathbf{C}^\mathcal{H}_n$ (default), $\mathbf{C}^\mathcal{W}_n$, $\mathbf{C}_n$, or $\mathbf{C}^\mathcal{B}_n$;

bernprogress

The Bernstein copula extension is CPU intensive(!), so a splash counter is pushed to the console via message() function in Rso as to not discourage the user; and

...

Additional arguments to pass.

Value

Value(s) for the copula are returned.

encoding

utf8

concept

Bernstein copula
Bernstein empirical copula
Hazen copula
Hazen empirical copula
Weibull copula
Weibull empirical copula

References

Hernández-Maldonado{Hernandez-Maldonado}, V., Díaz-Viera{Diaz-Viera}, M., and Erdely, A., 2012, A joint stochastic simulation method using the Bernstein copula as a flexible tool for modeling nonlinear dependence structures between petrophysical properties: Journal of Petroleum Science and Engineering, v. 90--91, pp. 112--123.

Nelsen, R.B., 2006, An introduction to copulas: New York, Springer, 269 p.

Salvadori, G., De Michele, C., Kottegoda, N.T., and Rosso, R., 2007, Extremes in Nature---An approach using copulas: Springer, 289 p.

Examples

Run this code

EMPIRcop(0.321,0.78, para=simCOP(n=90, cop=N4212cop, para=2.32, graphics=FALSE))
N4212cop(0.321,0.78, para=2.32)#

set.seed(62) # See note below about another seed to try.
psp <- simCOP(n=34, cop=PSP, ploton=FALSE, points=FALSE) * 150
# Pretend psp is real data, the * 150 is to clearly get into an arbitrary unit system.

# The sort=FALSE is critical in the following two calls. Although the Weibull
# plotting positions are chosen, internally EMPIRcop uses ranks, but the model
# here is to imagine one having a sample in native units of the random variables
# and then casting them into probabilities for other purposes.
fakeU <- lmomco::pp(psp[,1], sort=FALSE) # Weibull plotting position i/(n+1)
fakeV <- lmomco::pp(psp[,2], sort=FALSE) # Weibull plotting position i/(n+1)
uv <- data.frame(U=fakeU, V=fakeV); # our U-statistics

# The next four values should be very close if n above were say 1000, but the
# ctype="bernstein"" should not be used if n >> 34 because of inherently long runtime.
PSP(0.4,0.6)              # 0.3157895 (compare to listed values below)

# Two seeds are shown so that the user can see that depending on the distribution
# of the values given by para that different coincidences of which method is
# equivalent to another exist.
# For set.seed(62) --- "hazen" == "weibull" by coincidence
#    "hazen"     --> 0.3529412
#    "weibull"   --> 0.3529412
#    "1/n"       --> 0.3235294
#    "bernstein" --> 0.3228916
# For set.seed(85) --- "1/n" == "hazen" by coincidence
#    "hazen"     --> 0.3529412
#    "weibull"   --> 0.3823529
#    "1/n"       --> 0.3529412
#    "bernstein" --> 0.3440387

# For set.seed(62) --- not all measures of association can be used for all
# empirical copulas because of 'divergent' integral errors, but this is an example
# for Hoeffding's Phi.
hoefCOP(as.sample=TRUE, para=uv) #  (sample estimator is fast)  # 0.4987755
hoefCOP(cop=EMPIRcop,   para=uv, ctype="hazen")                 # 0.5035348
hoefCOP(cop=EMPIRcop,   para=uv, ctype="weibull")               # 0.4977145
hoefCOP(cop=EMPIRcop,   para=uv, ctype="1/n")                   # 0.4003646
hoefCOP(cop=EMPIRcop,   para=uv, ctype="bernstein")             # 0.4563724

# All other example suites shown below are dependent on the pseudo-data in the
# variable uv. It is suggested to not run with a sample size much larger than the
# above n=34 if the Bernstein comparison is intended (wanted) simply because of
# lengthy(!) run times, but the n=34 does provide a solid demonstration how the
# level curves for berstein weights are quite smooth.

# Now let us construct as many as three sets of level curves
level.curvesCOP(cop=PSP); # TRUE, parametric, fast, BLACK CURVES

# Empirical copulas can consume lots of CPU.
# RED CURVES, if n is too small, uniroot() errors might be triggered and likely
# will be using the sample size of 34 shown above.
level.curvesCOP(cop=EMPIRcop, para=uv, delu=0.04, col=2, ploton=FALSE)

# GREEN CURVES (large CPU committment)
# Bernsteinprogress is uninformative because level.curves has taken over control.
bpara <- list(para=uv, ctype="bernstein", bernprogress=FALSE)
level.curvesCOP(cop=EMPIRcop, para=bpara, delu=0.05, col=3, ploton=FALSE)
# The delu is increased for faster execution but more important,
# notice the greater smoothness of the Bernstein refinement.

# Experimental from R Graphics by Murrell (2005, p.112)
"trans3d" <-
function(x,y,z, pmat) tmat <- cbind(x,y,z,1) %*% pmat; return(tmat[,1:2] / tmat[,4])

the.grid <- EMPIRgrid(para=uv)
the.diag <- diagCOP(cop=EMPIRcop, para=uv, ploton=FALSE, lines=FALSE)

the.persp <- persp(the.grid$empcop, theta=-25, phi=20,
                   xlab="U VARIABLE", ylab="V VARIABLE", zlab="COPULA C(u,v)")
the.trace <- trans3d(the.diag$t, the.diag$t, the.diag$diagcop, the.persp)
lines(the.trace, lwd=2, col=2) # The diagonal of the copula

# The following could have been used as an alternative to call persp()
the.persp <- persp(x=the.grid$u, y=the.grid$v, z=the.grid$empcop, theta=-25, phi=20,
                   xlab="U VARIABLE", ylab="V VARIABLE", zlab="COPULA C(u,v)")#

Run the code above in your browser using DataLab