empCopula: The Empirical Copula

Description

Computes the empirical copula (according to a provided method) and auxiliary tools.

Usage

empCopula(X, smoothing = c("none", "beta", "checkerboard",
                           "schaake.shuffle"), offset = 0,
          ties.method = c("max", "average", "first", "last", "random", "min"))
C.n(u, X, smoothing = c("none", "beta", "checkerboard"), offset = 0,
    ties.method = c("max", "average", "first", "last", "random", "min"))
dCn(u, U, j.ind = 1:d, b = 1/sqrt(nrow(U)), ...)
F.n(x, X, offset = 0, smoothing = c("none", "beta", "checkerboard"))
Cn(x, w) ## <-- deprecated!  use  C.n(w, x) instead!
toEmpMargins(U, x, ...)

Value

empCopula() is the constructor for objects of class

empCopula.

C.n() returns the empirical copula of the pseudo-observations

X evaluated at u (or a smoothed version of it).

dCn() returns a vector (length(j.ind) is 1) or a matrix (with number of columns equal to length(j.ind)), containing the approximated first-order partial derivatives of the unknown copula at u with respect to the arguments in j.ind.

F.n() returns the empirical distribution function of X

evaluated at x if smoothing = "none", the empirical beta copula evaluated at x if smoothing = "beta" and the empirical checkerboard copula evaluated at x if smoothing = "checkerboard".

toEmpMargins() transforms the copula sample U to the empirical margins based on the sample x.

Arguments

X

an $(n, d)$-matrix of pseudo-observations with $d$ columns (as x or u). Recall that a multivariate random sample can be transformed to pseudo-observations via pobs(). For F.n() and if smoothing != "none", X can also be a general, multivariate sample, in which case the empirical distribution function is computed.

u,w

an $(m, d)$-matrix with elements in $[0,1]$ whose rows contain the evaluation points of the empirical copula.

U

an $(n,d)$-matrix of pseudo- (or copula-)observations (elements in $[0,1]$, same number $d$ of columns as u (for dCn())) or x (for toEmpMargins()).

x

an $(m, d)$-matrix whose rows contain the evaluation points of the empirical distribution< function (if smoothing = "none") or copula (if smoothing != "none").

smoothing

character string specifying the type of smoothing of the empirical distribution function (for F.n()) or the empirical copula (for C.n()). Available are:

"none": the original empirical distribution function or empirical copula.

"beta"

the empirical beta smoothed distribution function or empirical beta copula.

"checkerboard"

empirical checkerboard construction.

"schaake.shuffle"

in each dimension, n (so nrow(X)-many) sorted standard uniforms are used to construct a smooth sample, from which one draws with replacement as many observations as required; only available for the empirical copula and only for sampling.

ties.method

character string specifying how ranks should be computed if there are ties in any of the coordinate samples of x; passed to pobs.

j.ind

integer vector of indices $j$ between 1 and $d$ indicating the dimensions with respect to which first-order partial derivatives are approximated.

numeric giving the bandwidth for approximating first-order partial derivatives.

offset

used in scaling the result which is of the form sum(....)/(n+offset); defaults to zero.

...

additional arguments passed to dCn() or sort() underlying toEmpMargins().

Details

Given pseudo-observations from a distribution with continuous margins and copula C, the empirical copula is the (default) empirical distribution function of these pseudo-observations. It is thus a natural nonparametric estimator of C. The function C.n() computes the empirical copula or two alternative smoothed versions of it: the empirical beta copula or the empirical checkerboard copula; see Eqs. (2.1) and (4.1) in Segers, Sibuya and Tsukahara (2017), and the references therein. empCopula() is the constructor of an object of class empCopula.

The function dCn() approximates first-order partial derivatives of the unknown copula using the empirical copula.

The function F.n() computes the empirical distribution function of a multivariate sample. Note that C.n(u, X, smoothing="none", *) simply calls F.n(u, pobs(X), *) after checking u.

There are several asymptotically equivalent definitions of the empirical copula. C.n(, smoothing = "none") is simply defined as the empirical distribution function computed from the pseudo-observations, that is, $$C_n(\bm{u})=\frac{1}{n}\sum_{i=1}^n\mathbf{1}_{\{\hat{\bm{U}}_i\le\bm{u}\}},$$ where $\hat{\bm{U}}_i$, $i\in\{1,\dots,n\}$, denote the pseudo-observations and $n$ the sample size. Internally, C.n(,smoothing = "none") is just a wrapper for F.n() and is expected to be fed with the pseudo-observations.

The approximation for the $j$th partial derivative of the unknown copula $C$ is implemented as, for example, in Rémillard and Scaillet (2009), and given by $$\hat{\dot{C}}_{jn}(\bm{u})=\frac{C_n(u_1,..,u_{j-1},min(u_j+b,1),u_{j+1},..,u_d)-C_n(u_1,..,u_{j-1},max(u_j-b,0),u_{j+1},..,u_d)}{2b},$$ where $b$ denotes the bandwidth and $C_n$ the empirical copula.

References

Rüschendorf, L. (1976). Asymptotic distributions of multivariate rank order statistics, Annals of Statistics 4, 912--923.

Deheuvels, P. (1979). La fonction de dépendance empirique et ses propriétés: un test non paramétrique d'indépendance, Acad. Roy. Belg. Bull. Cl. Sci., 5th Ser. 65, 274--292.

Deheuvels, P. (1981). A non parametric test for independence, Publ. Inst. Statist. Univ. Paris 26, 29--50.

Clark, M., Gangopadhyay, S., Hay, L., Rajagopalan, B. and Wilby, R. (2004). The Schaake Shuffle: A Method for Reconstructing Space-Time Variability in Forecasted Precipitation and Temperature Fields. Journal of Hydrometeorology, pages 243-262.

Rémillard, B. and Scaillet, O. (2009). Testing for equality between two copulas. Journal of Multivariate Analysis, 100(3), pages 377-386.

Segers, J., Sibuya, M. and Tsukahara, H. (2017). The Empirical Beta Copula. Journal of Multivariate Analysis, 155, pages 35--51, https://arxiv.org/abs/1607.04430.

Kiriliouk, A., Segers, J. and Tsukahara, H. (2020). Resampling Procedures with Empirical Beta Copulas. https://arxiv.org/abs/1905.12466.

Examples

Run this code

## Generate data X (from a meta-Gumbel model with N(0,1) margins)
n <- 100
d <- 3
family <- "Gumbel"
theta <- 2
cop <- onacopulaL(family, list(theta=theta, 1:d))
set.seed(1)
X <- qnorm(rCopula(n, cop)) # meta-Gumbel data with N(0,1) margins

## Evaluate empirical copula
u <- matrix(runif(n*d), n, d) # random points were to evaluate the empirical copula
ec <- C.n(u, X = X)

## Compare the empirical copula with the true copula
pc <- pCopula(u, copula = cop)
mean(abs(pc - ec)) # ~= 0.012 -- increase n to decrease this error

## The same for the two smoothed versions
beta <- C.n(u, X, smoothing = "beta")
mean(abs(pc - beta))
check <- C.n(u, X, smoothing = "checkerboard")
mean(abs(pc - check))

## Compare the empirical copula with F.n(pobs())
U <- pobs(X) # pseudo-observations
stopifnot(identical(ec, F.n(u, X = pobs(U)))) # even identical

## Compare the empirical copula based on U at U with the Kendall distribution
## Note: Theoretically, C(U) ~ K, so K(C_n(U, U = U)) should approximately be U(0,1)
plot(ecdf(pK(C.n(U, X), cop = cop@copula, d = d)), asp = 1, xaxs="i", yaxs="i")
segments(0,0, 1,1, col=adjustcolor("blue",1/3), lwd=5, lty = 2)
abline(v=0:1, col="gray70", lty = 2)

## Compare the empirical copula and the true copula on the diagonal
C.n.diag <- function(u) C.n(do.call(cbind, rep(list(u), d)), X = X) # diagonal of C_n
C.diag <- function(u) pCopula(do.call(cbind, rep(list(u), d)), cop) # diagonal of C
curve(C.n.diag, from = 0, to = 1, # empirical copula diagonal
      main = paste("True vs empirical diagonal of a", family, "copula"),
      xlab = "u", ylab = quote("True C(u,..,u) and empirical"~C[n](u,..,u)))
curve(C.diag, lty = 2, add = TRUE) # add true copula diagonal
legend("bottomright", lty = 2:1, bty = "n", inset = 0.02,
       legend = expression(C, C[n]))

## Approximate partial derivatives w.r.t. the 2nd and 3rd component
j.ind <- 2:3 # indices w.r.t. which the partial derivatives are computed
## Partial derivatives based on the empirical copula and the true copula
der23 <- dCn(u, U = pobs(U), j.ind = j.ind)
der23. <- copula:::dCdu(archmCopula(family, param=theta, dim=d), u=u)[,j.ind]
## Approximation error
summary(as.vector(abs(der23-der23.)))

  U <- U[1:64 ,]# such that m != n
  stopifnot(suppressWarnings( ## deprecation warning ..
    identical(C.n(u, U),
              Cn (U, u))))

## For an example of using F.n(), see help(mvdc)% ./Mvdc.Rd

## Generate a bivariate empirical copula object (various smoothing methods)
n <- 10 # sample size
d <- 2 # dimension
set.seed(271)
X <- rCopula(n, copula = claytonCopula(3, dim = d))
ecop.orig  <- empCopula(X) # smoothing = "none"
ecop.beta  <- empCopula(X, smoothing = "beta")
ecop.check <- empCopula(X, smoothing = "checkerboard")

## Sample from these (smoothed) empirical copulas
m <- 50
U.orig  <-  rCopula(m, copula = ecop.orig)
U.beta  <-  rCopula(m, copula = ecop.beta)
U.check <-  rCopula(m, copula = ecop.check)

## Plot
wireframe2(ecop.orig,  FUN = pCopula, draw.4.pCoplines = FALSE)
wireframe2(ecop.beta,  FUN = pCopula)
wireframe2(ecop.check, FUN = pCopula)
## Density (only exists when smoothing = "beta")
wireframe2(ecop.beta,  FUN = dCopula)

## Transform a copula sample to empirical margins
set.seed(271)
X <- qexp(rCopula(1000, copula = claytonCopula(2))) # multivariate distribution
U <- rCopula(917, copula = gumbelCopula(2)) # new copula sample
X. <- toEmpMargins(U, x = X) # tranform U to the empirical margins of X
plot(X.) # Gumbel sample with empirical margins of X

Run the code above in your browser using DataLab