copBasic-package: Basic Theoretical Copula, Empirical Copula, and Various Utility Functions

Description

The copBasic package is heavily oriented around copula theory and mathematical operations of copulas and closely follows the standard texts in the field by Nelsen (2006) and from 2015$+$ increasely Joe (2014) as well. Another good text is by Salvadori et al. (2007) and is cited herein, but about half of that excellent book is on univariate applications. The primal objective of copBasic is to provide a basic application programming interface (API) into numerous results shown by authoritative texts on copulas. It is hoped in part that the package will help new inductees to copulas in their self study, potential course work, and applied circumstances. The language and vocabulary of copulas is quite formidable.

The author has often emphasized vocabulary words in italics. Italic typeface is used extensively and usually near the opening of function-by-function documentation to identify vocabulary words, such as survival copula (see surCOP). This syntax tries to mimic and accentuate the word usage in Nelsen (2006) and Joe (2014). The copBasic package was started well before Joe (2014) so many more citations to Nelsen (2006) are made to connect this package into copula theory.

The italics then is used to highlight this vocabulary in order to draw connections between concepts. In conjunction with the function summary in copBasic-package, the extensive function cross referencing throughout and indexing of this documentation should also help. The author had no experience with copulas and really multivariate probability theory---although applied problems were beginning to emerge---prior to happening upon Nelsen (2006). The copBasic package is a personal tour de force in self study. Hopefully, this package and user's manual will help others.

A few comments on notation are needed. A bold math typeface is used to represent a copula such as $\mathbf{\Pi}$ (see P) for the independence copula. The syntax $\mathcal{R}\times\mathcal{R} \equiv \mathcal{R}^2$ denotes the orthogonal domain of two real numbers, and $[0,1]\times [0,1]$ $\equiv$ $\mathcal{I}\times\mathcal{I} \equiv \mathcal{I}^2$ denotes the orthogonal domain on the unit square of probabilities. Limits of integration $[0,1]$ or $[0,1]^2$ involving copulas thus are shown as $\mathcal{I}$ and $\mathcal{I}^2$, respectively.

The random variables $X$ and $Y$ respectively denote the horizontal and vertical directions in $\mathcal{R}^2$. Their probabilistic counterparts are uniformly distributed random variables on $[0,1]$, are respectively denoted as $U$ and $V$, and necessarily also are the directions in $\mathcal{I}^2$. Often realizations of these random variables are respectively $x$ and $y$ for $X$ and $Y$ and $u$ and $v$ for $U$ and $V$.

There is an obvious difference between nonexceedance probability $F$ and exceedance probability $1-F$. Both $u$ and $v$ are measures in nonexceedance (cumulative probability). Arguments to many functions herein are u $= u$ and v $= v$ and are almost exclusively nonexceedance but there are instances for which u $= 1 - u = u'$ and v $= 1 - v = v'$. Helpful Navigation of the copBasic Package

Some helpful guideposts into the package are listed in the following table: lclr{ Name Symbol Function Concept Copula $\mathbf{C}(u,v)$ COP copula theory Survival copula $\hat\mathbf{C}(u',v')$ surCOP copula theory Joint survival function $\overline{\mathbf{C}}(u,v)$ surfuncCOP copula theory Co-copula $\mathbf{C}^\star(u',v')$ coCOP copula theory Dual of a copula $\tilde\mathbf{C}(u,v)$ duCOP copula theory Primary copula diagonal $\delta(t)$ diagCOP copula theory Secondary copula diagonal $\delta^\star(t)$ diagCOP copula theory Inverse copula diagonal $\delta^{(-1)}(f)$ diagCOPatf copula theory Joint probability ${-}{-}$ jointCOP copula theory Blomqvist's Beta $\beta_\mathbf{C}$ blomCOP bivariate association Gini's Gamma $\gamma_\mathbf{C}$ giniCOP bivariate association Hoeffding's Phi $\Phi_\mathbf{C}$ hoefCOP bivariate association Joe's Nu-Skew $\nu_\mathbf{C}$ joeskewCOP bivariate association Joe's Nu-Skew-Star $\nu^\star_\mathbf{C}$ joeskewCOP bivariate association Lp distance $\Phi_\mathbf{C} \rightarrow L_p$ LpCOP bivariate association Kendall's Tau $\tau_\mathbf{C}$ tauCOP bivariate association Kendall's Measure $K_\mathbf{C}(z)$ kmeasCOP copula theory Kendall's Function $F_K(z)$ kfuncCOP copula theory Inverse Kendall's Function $F_K^{(-1)}(z)$ kfuncCOPinv copula theory An L-moment of $F_K(z)$ $\lambda_r(F_K)$ kfuncCOPlmom L-moment theory L-moments of $F_K(z)$ $\lambda_r(F_K)$ kfuncCOPlmoms L-moment theory Semi-correlations $\rho_N^{-}(a)$ semicorCOP bivariate tail association Semi-correlations $\rho_N^{+}(a)$ semicorCOP bivariate tail association Spearman's Footrule $\psi_\mathbf{C}$ footCOP bivariate association Spearman's Rho $\rho_\mathbf{C}$ rhoCOP bivariate association Schweizer and Wolff's Sigma $\sigma_\mathbf{C}$ wolfCOP bivariate association Lower-bounds copula $\mathbf{W}(u,v)$ W copula Independence copula $\mathbf{\Pi}(u,v)$ P copula Upper-bounds copula $\mathbf{M}(u,v)$ M copula Fréchet{Frechet} Family copula $\mathbf{FF}(u,v)$ FRECHETcop copula Galambos copula $\mathbf{GL}(u,v)$ GLcop copula Gumbel-Hougaard copula $\mathbf{GH}(u,v)$ GHcop copula Hüsler{Husler}-Reiss copula $\mathbf{HR}(u,v)$ HRcop copula Plackett copula $\mathbf{PL}(u,v)$ PLACKETTcop copula PSP copula $\mathbf{PSP}(u,v)$ PSP copula Density $c(u,v)$ densityCOP copula density Density visualization ${-}{-}$ densityCOPplot copula density Empirical copula $\mathbf{C}_n(u,v)$ EMPIRcop copula Empirical simulation ${-}{-}$ EMPIRsim copula simulation Empirical simulation ${-}{-}$ EMPIRsimv copula simulation Empirical copulatic surface ${-}{-}$ EMPIRgrid copulatic surface Parametric copulatic surface ${-}{-}$ gridCOP copulatic surface Parametric simulation ${-}{-}$ simCOP copula simulation Parametric simulation ${-}{-}$ simCOPmicro copula simulation }

Several of the functions listed above are measures of bivariate association. Two of the measures (Kendall's Tau, tauCOP; Spearman's Rho, rhoCOP) are familiar and Rprovides native support for their sample estimation of course but each function can be used to call R's cor() function for parallelism to the other measures. The other measures (Blomqvist's Beta, Gini's Gamma, Hoeffding's Phi, Schweizer and Wolff's Sigma, Spearman's Footrule) support sample estimation by specially formed calls to their respective functions: blomCOP, giniCOP, hoefCOP, wolfCOP, and footCOP. The documentation for Gini's Gamma (giniCOP) shows extensive use of theoretical and sample compuations for all of these functions. Lastly and related, bivariate skewness measures are supported in joeskewCOP (nuskewCOP and nustarCOP) and uvlmoms (uvskew). Extensive discussion and example computations of bivariate skewness is provided in the joeskewCOP documentation.

Bivariate random simulation by several functions is identified in the previous table. The copBasic package explicitly uses only conditional simulation also known as the conditional distribution method for random variate generation following the guidance of Nelsen (2006, pp. 40--41) (see also simCOPmicro, simCOP, derCOPinv, derCOPinv2). There are many other methods in the literature and available in R. A comparison of methods is made in the Examples section of the Gumbel-Hougaard copula (GHcop).

Several functions in copBasic make the distinction between $V$ with respect to (wrt) $U$ and $U$ wrt $V$, and a guide for the nomenclature involving wrt distinctions is listed in the following table: lclr{ Name Symbol Function Concept Copula inversion $V$ wrt $U$ COPinv copula operator Copula inversion $U$ wrt $V$ COPinv2 copula operator Copula derivative $\delta \mathbf{C}/\delta u$ derCOP copula operator Copula derivative $\delta \mathbf{C}/\delta v$ derCOP2 copula operator Copula derivative inversion $V$ wrt $U$ derCOPinv copula operator Copula derivative inversion $U$ wrt $V$ derCOPinv2 copula operator Level curves $t \mapsto \mathbf{C}(u=U, v)$ joint.curvesCOP copula theory Level curves $t \mapsto \mathbf{C}(u=U, v)$ level.curvesCOP copula theory Level curves $t \mapsto \mathbf{C}(u, v=V)$ level.curvesCOP2 copula theory Level set $V$ wrt $U$ level.setCOP copula theory Level set $U$ wrt $V$ level.setCOP2 copula theory Median regression $V$ wrt $U$ med.regressCOP copula theory Median regression $U$ wrt $V$ med.regressCOP2 copula theory Quantile regression $V$ wrt $U$ qua.regressCOP copula theory Quantile regression $U$ wrt $V$ qua.regressCOP2 copula theory Copula section $t \mapsto \mathbf{C}(t,a)$ sectionCOP copula theory Copula section $t \mapsto \mathbf{C}(a,t)$ sectionCOP copula theory }

The two tables do not include all of the myriad of special functions to support similar operations on empirical copulas. All empirical copula operators and utilites are prepended with EMPIR in the function name. An additional note concerning package nomenclature is that an appended 2 to a function name indicates $U$ wrt $V$ (e.g. EMPIRgridderinv2 for an inversion of the partial derivatives $\delta \mathbf{C}/\delta v$ across the grid of the empirical copula).

Some additional functions to compute often salient features or characteristics of copulas, including functions for bivariate inference or goodness-of-fit, are listed in the following table: lclr{ Name Symbol Function Concept Left-tail decreasing $V$ wrt $U$ isCOP.LTD bivariate association Left-tail decreasing $U$ wrt $V$ isCOP.LTD bivariate association Right-tail increasing $V$ wrt $U$ isCOP.RTI bivariate association Right-tail increasing $U$ wrt $V$ isCOP.RTI bivariate association Tail concentration function $q_\mathbf{C}(t)$ tailconCOP bivariate tail association Tail (lower) dependency $\lambda^L_\mathbf{C}$ taildepCOP bivariate tail association Tail (upper) dependency $\lambda^U_\mathbf{C}$ taildepCOP bivariate tail association Tail (lower) order $\kappa^L_\mathbf{C}$ tailordCOP bivariate tail association Tail (upper) order $\kappa^U_\mathbf{C}$ tailordCOP bivariate tail association Neg'ly quadrant dependency NQD isCOP.PQD bivariate association Pos'ly quadrant dependency PQD isCOP.PQD bivariate association Permutation symmetry $\mathrm{permsym}$ isCOP.permsym copula symmetry Radial symmetry $\mathrm{radsym}$ isCOP.radsym copula symmetry Skewness (Joe, 2014) $\eta(p; \psi)$ uvskew bivariate skewness Kullback-Leibler divergence $\mathrm{KL}(f|g)$ kullCOP bivariate inference K-L sample size $n_{f\!g}$ kullCOP bivariate inference General goodness-of-fit $T_n$ statTn bivariate inference Vuong's Procedure ${-}{-}$ vuongCOP bivariate inference L-comoments (samp. distr.) ${-}{-}$ lcomCOPpv experimental bivariate inference }

The following Table of Probabilities lists some important relations between various joint probability concepts, the copula, nonexceedance probabilities $u$ and $v$, and exceedance probabilities $u'$ and $v'$. A compact summary of these probability relations has obvious usefulness. The notation $[\cdots, \cdots]$ is the same as $[\cdots \mathrm{\ and\ } \cdots]$. rcl{ Probability and Symbol Convention $\mathrm{Pr}[\,U \le u, V \le v\,]$ $=$ $\mathbf{C}(u,v)$ $\mathrm{Pr}[\,U > u, V > v\,]$ $=$ $\hat\mathbf{C}(u',v')$ $\mathrm{Pr}[\,U \le u, V > v\,]$ $=$ $u - \mathbf{C}(u,v')$ $\mathrm{Pr}[\,U > u, V \le v\,]$ $=$ $v - \mathbf{C}(u',v)$ $\mathrm{Pr}[\,U \le u \mid V \le v\,]$ $=$ $\mathbf{C}(u,v)/v$ $\mathrm{Pr}[\,V \le v \mid U \le u\,]$ $=$ $\mathbf{C}(u,v)/u$ $\mathrm{Pr}[\,U \le u \mid V > v\,]$ $=$ $(u - \mathbf{C}(u,v))/(1 - v)$ $\mathrm{Pr}[\,V \le v \mid U > u\,]$ $=$ $(v - \mathbf{C}(u,v))/(1 - u)$ $\mathrm{Pr}[\,U > u \mid V > v\,]$ $=$ $\hat\mathbf{C}(u',v')/u' = \overline\mathbf{C}(u,v)/(1-u)$ $\mathrm{Pr}[\,V > v \mid U > u\,]$ $=$ $\hat\mathbf{C}(u',v')/v' = \overline\mathbf{C}(u,v)/(1-v)$ $\mathrm{Pr}[\,V \le v \mid U = u\,]$ $=$ $\delta \mathbf{C}(u,v)/\delta u$ $\mathrm{Pr}[\,U \le u \mid V = v\,]$ $=$ $\delta \mathbf{C}(u,v)/\delta v$ $\mathrm{Pr}[\,U > u \mathrm{\ or\ } V > v\,]$ $=$ $\mathbf{C}^\star(u',v') = 1 - \mathbf{C}(u',v')$ $\mathrm{Pr}[\,U \le u \mathrm{\ or\ } V \le v\,]$ $=$ $\tilde\mathbf{C}(u,v) = u + v - \mathbf{C}(u,v)$ } The function jointCOP has considerable demonstration in its Note section of the joint and and or relations supported by simulation + counting scenarios. Also there is considerable demonstration in the Note section of function duCOP on application of the concepts of joint and conditions, joint or conditions, and importantly joint mutually exclusive or conditions.

One or two copulas can be composited, combined, or multiplied in interesting ways to create highly unique joint probability relations and complex dependence structures. copBasic provides the three functions for copula composition and these compositing functions: function composite1COP composites a single copula with two compositing parameters, function composite2COP composites two copulas with two compositing parameters, and function composite3COP composites two copulas with four compositing parameters. Two copulas can be combined through a weighted convex combination using convex2COP with a single weighting parameter. Lastly, copula multiplication of two copulas to form a third is supported by prod2COP. cclr{ No. of copulas Combining Parameters Function Concept 1 $\alpha, \beta$ composite1COP copula combination 2 $\alpha, \beta$ composite2COP copula combination 2 $\alpha, \beta, \kappa, \gamma$ composite3COP copula combination 2 $\alpha$ convex2COP copula combination 2 $\bigl(\mathbf{C}_1 \ast \mathbf{C}_2 \bigr)$ prod2COP copula multiplication } All of the five functions for compositing, combining, or multipling copulas are compatible with joint probability simulation (simCOP), measures of association (e.g. $\rho_\mathbf{C}$), and presumably all other copula operations using copBasic features.

A Review of Return Periods using Copulas

Risk analyses of natural hazards are commonly expressed as annual return periods $T$ in years, which are defined for a nonexceedance probability $q$ as $T = 1/(1-q)$. In bivariate analysis, there immediately emerge two types of return periods representing $T_{q;\,\mathrm{coop}}$ and $T_{q;\,\mathrm{dual}}$ conditions between nonexceedances of the two hazard sources (random variables) $U$ and $V$. It is usual in many applications for $T$ to be expressed equivalently as a probability $q$ in common for both variables.

Incidently, the $\mathrm{Pr}[\,U > u \mid V > v\,]$ and $\mathrm{Pr}[\,V > v \mid U > u\,]$ probabilities also are useful for conditional return period computations following Salvadori et al. (2007, p. 159--160) but are not further considered here. Also the $F_K(w)$ (Kendall's Function or Kendall's Measure of a copula) is the core tool for secondary return period computations (see kfuncCOP).

Let the copula $\mathbf{C}(u,v; \Theta)$ for nonexceedances $u$ and $v$ be set for some copula family (formula) by a parameter vector $\Theta$. The copula family and parameters define the joint coupling (loosely meaning dependency/correlation) between hazards $U$ and $V$. If failure occurs if either or both hazards $U$ and $V$ are a probability $q$ threshold ($u = v = 1 - 1/T = q$) for $T$-year return period, then the real return period of failure is defined using either the copula ($\mathbf{C}(q,q; \Theta)$ or the co-copula ($\mathbf{C}^\star(q',q'; \Theta)$) for exceedance probability $q' = 1 - q$ is

$$T_{q;\,\mathrm{coop}} = \frac{1}{1 - \mathbf{C}(q, q; \Theta)} = \frac{1}{\mathbf{C}^\star(1-q, 1-q; \Theta)}\mbox{\ and}$$ $$T_{q;\,\mathrm{coop}} \equiv \frac{1}{\mathrm{cooperative\ risk}}\mbox{.}$$

However, if failure only occurs if and only if both hazards $U$ and $V$ occur simultaneously (that is dually work together), then the real return period is defined using either the dual of a copula (function) ($\tilde\mathbf{C}(q,q; \Theta)$), the joint survival function ($\overline\mathbf{C}(q,q;\Theta)$), or survival copula ($\hat\mathbf{C}(q',q'; \Theta)$) as

$$T_{q;\,\mathrm{dual}} = \frac{1}{1 - \tilde\mathbf{C}(q,q; \Theta)} = \frac{1}{\overline\mathbf{C}(q,q;\Theta)} = \frac{1}{\hat\mathbf{C}(q',q';\Theta)} \mbox{\ and}$$ $$T_{q;\,\mathrm{dual}} \equiv \frac{1}{\mathrm{complement\ of\ dual\ protection}}\mbox{.}$$

A numerical demonstration is informative. Salvadori et al. (2007, p. 151) show for a Gumbel-Hougaard copula (GHcop) having $\Theta =$ 3.055 and $T =$ 1,000 years ($q = 0.999$) that $T_{q;\,\mathrm{coop}} = 797.1$ years and that $T_{q;\,\mathrm{dual}}$ = 1,341.4 years, which means that average return periods between failures are $$T_{q;\,\mathrm{coop}} \le T \le T_{q;\,\mathrm{dual}}\mbox{.}$$ These values are readily computed and verified using the prob2T() function from the lmomco package along with copBasic functions COP (generic functional interface to a copula) and duCOP (dual of a copula): q <- T2prob(1000) lmomco::prob2T( COP(q,q, cop=GHcop, para=3.055)) # 797.110 lmomco::prob2T(duCOP(q,q, cop=GHcop, para=3.055)) # 1341.438 An early source (in 2005) by some of those authors cited on p.151 of Salvadori et al. (2007; their citation [67]) shows $T_{q;\,\mathrm{dual}} = 798$ years---a rounding error seems to have been committed. Finally just for reference, a Gumbel-Hougaard copula having $\Theta = 3.055$ corresponds to an analytical Kendall's Tau (see GHcop) of $\tau \approx 0.673$, which can be verified through numerical integration available from tauCOP as: tauCOP(cop=GHcop, para=3.055, brute = TRUE) # 0.6726542 Thus, a better understanding of the statistical characteristics of [multiple hazard sources] requires the study of their joint distribution (Salvadori et al., 2007, p. 150).

Useful Copula Relations by Visualization

There are a myriad of relations between variables computable through copulas, and these were listed in the Table of Probabilities earlier in this documentation. There is a script located in the inst/doc directory of the copBasic sources titled CopulaRelations_BaseFigure_inR.txt. This script demonstrates, using the PSP copula, relations between the copula (COP), survival copula (surCOP), joint survival function of a copula (surfuncCOP), co-copula (coCOP), and dual of a copula function (duCOP). The script performs simulation and manual counts observations meeting select criteria in order to compute their empirical probabilities. The script produces a base figure, which after extending in a vector editing software application, is suitable for educational description and is shown below.

html{

CopulaRelationsFigure4pkg.jpg} latex{

CopulaRelationsFigure4pkg.pdf{options: width=4.5in, trim=1in 2.25in 0.75in 1in, clip}}

Arguments

encoding

utf8

concept

copula theory

References

Cherubini, U., Luciano, E., and Vecchiato, W., 2004, Copula methods in finance: Hoboken, NJ, Wiley, 293 p.

Hernández-Maldonado{Hernandez-Maldonado}, V., Díaz-Viera{Diaz-Viera}, M., and Erdely, A., 2012, A joint stochastic simulation method using the Bernstein copula as a flexible tool for modeling nonlinear dependence structures between petrophysical properties: Journal of Petroleum Science and Engineering, v. 90--91, pp. 112--123.

Joe, H., 2014, Dependence modeling with copulas: Boca Raton, CRC Press, 462 p.

Nelsen, R.B., 2006, An introduction to copulas: New York, Springer, 269 p.

Salvadori, G., De Michele, C., Kottegoda, N.T., and Rosso, R., 2007, Extremes in nature---An approach using copulas: Dordrecht, Netherlands, Springer, Water Science and Technology Library 56, 292 p.

Examples

Run this code

# Nelsen (2006, p. 75, exer. 3.15b) provides for a nice test of copBasic features.
"mcdurv" <- function(u,v, theta) {
   ifelse(u > theta & u < 1-theta & v > theta & v < 1 - theta,
             return(M(u,v) - theta), # Upper bounds copula with a shift
             return(W(u,v)))         # Lower bounds copula
}
"MCDURV" <- function(u,v, para=NULL) {
   if(is.null(para))         stop("need theta")
   if(para < 0 | para > 0.5) stop("theta ! in [0,1/2]")
   return(asCOP(u, v, f=mcdurv, para))
}
"afunc" <- function(t) { # a sample size = 1,000 hard wired
   return(cov(simCOP(n=1000, cop=MCDURV, para=t, ploton=FALSE, points=FALSE))[1,2])
}
set.seed(6234)
print(uniroot(afunc, c(0,0.5))) # result by simulation = 0.1023742
# Nelsen reports that if theta appox. 0.103 then covariance of U and V is zero.
# So one will have mutually completely dependent uncorrelated uniform variables!
rhoCOP(cop=MCDURV,  para=0.1023742) # Spearman Rho = 0.005854481 (near zero)
tauCOP(cop=MCDURV,  para=0.1023742) # Kendall Tau  = 0.2648521
wolfCOP(cop=MCDURV, para=0.1023742) # S & W Sigma  = 0.4690174
D <- simCOP(n=1000, cop=MCDURV, para=0.1023742) # Plot mimics Nelsen (2006, fig. 3.11)
# Lastly, open research problem. L-comoments (matrices) measure high dimension of
# variable comovements (see lmomco package)---"method of L-comoments" for estimation?
lmomco::lcomoms2(simCOP(n=1000, cop=MCDURV, para=0),   nmom=5) # Perfect neg. corr.
lmomco::lcomoms2(simCOP(n=1000, cop=MCDURV, para=0.1023742), nmom=5)
lmomco::lcomoms2(simCOP(n=1000, cop=MCDURV, para=0.5), nmom=5) # Perfect pos. corr.
# T2 (L-correlation), T3 (L-coskew), T4 (L-cokurtosis), and T5 matrices result. For
# Theta = 0 or 0.5 see the matrix symmetry with a sign change for L-coskew and T5 on
# the off diagonals (offdiags). See unities for T2. See near zero for offdiag terms
# in T2 near zero. But then see that T4 offdiags are quite different from those for
# Theta 0.1024 relative to 0 or 0.5. Thus, T4 has captured a unique property of U vs V.

Run the code above in your browser using DataLab