empCopula: The Empirical Copula

Description

Computes the empirical copula (according to a provided method) and auxiliary tools.

Usage

empCopula(X, smoothing = c("none", "beta", "checkerboard"), offset = 0,
          ties.method = c("max", "average", "first", "last", "random", "min"))
C.n(u, X, smoothing = c("none", "beta", "checkerboard"), offset = 0,
    ties.method = c("max", "average", "first", "last", "random", "min"))
dCn(u, U, j.ind = 1:d, b = 1/sqrt(nrow(U)), ...)
F.n(x, X, offset = 0, smoothing = c("none", "beta", "checkerboard"))
Cn(x, w) ## <-- deprecated!  use  C.n(w, x) instead!
toEmpMargins(U, x, ...)

Arguments

an $(n, d)$-matrix of pseudo-observations with $d$ columns (as x or u). Recall that a multivariate random sample can be transformed to pseudo-observations via pobs(). For F.n() and if smoothing != "none", X can also be a general, multivariate sample, in which case the empirical distribution function is computed.

u,w

an $(m, d)$-matrix with elements in $[0,1]$ whose rows contain the evaluation points of the empirical copula.

an $(n,d)$-matrix of pseudo- (or copula-)observations (elements in $[0,1]$, same number $d$ of columns as u (for dCn())) or x (for toEmpMargins()).

an $(m, d)$-matrix whose rows contain the evaluation points of the empirical distribution< function (if smoothing = "none") or copula (if smoothing != "none").

smoothing

character string specifying whether the empirical distribution function (for F.n()) or copula (for C.n()) is computed (if smoothing = "none"), or whether the empirical beta copula (smoothing = "beta") or the empirical checkerboard copula (smoothing = "checkerboard") is computed.

ties.method

character string specifying how ranks should be computed if there are ties in any of the coordinate samples of x; passed to pobs.

j.ind

integer vector of indices $j$ between 1 and $d$ indicating the dimensions with respect to which first-order partial derivatives are approximated.

numeric giving the bandwidth for approximating first-order partial derivatives.

offset

used in scaling the result which is of the form sum(....)/(n+offset); defaults to zero.

…

additional arguments passed to dCn() or sort() underlying toEmpMargins().

Value

empCopula() is the constructor for objects of class '>empCopula.

C.n() returns the empirical copula of the pseudo-observations X evaluated at u (or a smoothed version of it).

dCn() returns a vector (length(j.ind) is 1) or a matrix (with number of columns equal to length(j.ind)), containing the approximated first-order partial derivatives of the unknown copula at u with respect to the arguments in j.ind.

F.n() returns the empirical distribution function of X evaluated at x if smoothing = "none", the empirical beta copula evaluated at x if smoothing = "beta" and the empirical checkerboard copula evaluated at x if smoothing = "checkerboard".

toEmpMargins() transforms the copula sample U to the empirical margins based on the sample x.

Details

Given pseudo-observations from a distribution with continuous margins and copula C, the empirical copula is the (default) empirical distribution function of these pseudo-observations. It is thus a natural nonparametric estimator of C. The function C.n() computes the empirical copula or two alternative smoothed versions of it: the empirical beta copula or the empirical checkerboard copula; see Eqs. (2.1) and (4.1) in Segers, Sibuya and Tsukahara (2017), and the references therein. empCopula() is the constructor of an object of class '>empCopula.

The function dCn() approximates first-order partial derivatives of the unknown copula using the empirical copula.

The function F.n() computes the empirical distribution function of a multivariate sample. Note that C.n(u, X, smoothing="none", *) simply calls F.n(u, pobs(X), *) after checking u.

There are several asymptotically equivalent definitions of the empirical copula. C.n(, smoothing = "none") is simply defined as the empirical distribution function computed from the pseudo-observations, that is, $$C_n(\bm{u})=\frac{1}{n}\sum_{i=1}^n\mathbf{1}_{\{\hat{\bm{U}}_i\le\bm{u}\}},$$ where $\hat{\bm{U}}_i$, $i\in\{1,\dots,n\}$, denote the pseudo-observations and $n$ the sample size. Internally, C.n(,smoothing = "none") is just a wrapper for F.n() and is expected to be fed with the pseudo-observations.

The approximation for the $j$th partial derivative of the unknown copula $C$ is implemented as, for example, in R<U+00E9>millard and Scaillet (2009), and given by $$\hat{\dot{C}}_{jn}(\bm{u})=\frac{C_n(u_1,..,u_{j-1},min(u_j+b,1),u_{j+1},..,u_d)-C_n(u_1,..,u_{j-1},max(u_j-b,0),u_{j+1},..,u_d)}{2b},$$ where $b$ denotes the bandwidth and $C_n$ the empirical copula.

References

R<U+00FC>schendorf, L. (1976). Asymptotic distributions of multivariate rank order statistics, Annals of Statistics 4, 912--923.

Deheuvels, P. (1979). La fonction de d<U+00E9>pendance empirique et ses propri<U+00E9>t<U+00E9>s: un test non param<U+00E9>trique d'ind<U+00E9>pendance, Acad. Roy. Belg. Bull. Cl. Sci., 5th Ser. 65, 274--292.

Deheuvels, P. (1981). A non parametric test for independence, Publ. Inst. Statist. Univ. Paris 26, 29--50.

R<U+00E9>millard, B. and Scaillet, O. (2009). Testing for equality between two copulas. Journal of Multivariate Analysis, 100(3), pages 377-386.

Segers, J., Sibuya, M. and Tsukahara, H. (2017). The Empirical Beta Copula. Journal of Multivariate Analysis, 155, pages 35--51, http://arxiv.org/abs/1607.04430.

Examples

Run this code

# NOT RUN {
## Generate data X (from a meta-Gumbel model with N(0,1) margins)
n <- 100
d <- 3
family <- "Gumbel"
theta <- 2
cop <- onacopulaL(family, list(theta=theta, 1:d))
set.seed(1)
X <- qnorm(rCopula(n, cop)) # meta-Gumbel data with N(0,1) margins

## Evaluate empirical copula
u <- matrix(runif(n*d), n, d) # random points were to evaluate the empirical copula
ec <- C.n(u, X = X)

## Compare the empirical copula with the true copula
pc <- pCopula(u, copula = cop)
mean(abs(pc - ec)) # ~= 0.012 -- increase n to decrease this error

## The same for the two smoothed versions
beta <- C.n(u, X, smoothing = "beta")
mean(abs(pc - beta))
check <- C.n(u, X, smoothing = "checkerboard")
mean(abs(pc - check))

## Compare the empirical copula with F.n(pobs())
U <- pobs(X) # pseudo-observations
stopifnot(identical(ec, F.n(u, X = pobs(U)))) # even identical

## Compare the empirical copula based on U at U with the Kendall distribution
## Note: Theoretically, C(U) ~ K, so K(C_n(U, U = U)) should approximately be U(0,1)
plot(ecdf(pK(C.n(U, X), cop = cop@copula, d = d)), asp = 1, xaxs="i", yaxs="i")
segments(0,0, 1,1, col=adjustcolor("blue",1/3), lwd=5, lty = 2)
abline(v=0:1, col="gray70", lty = 2)

## Compare the empirical copula and the true copula on the diagonal
C.n.diag <- function(u) C.n(do.call(cbind, rep(list(u), d)), X = X) # diagonal of C_n
C.diag <- function(u) pCopula(do.call(cbind, rep(list(u), d)), cop) # diagonal of C
curve(C.n.diag, from = 0, to = 1, # empirical copula diagonal
      main = paste("True vs empirical diagonal of a", family, "copula"),
      xlab = "u", ylab = quote("True C(u,..,u) and empirical"~C[n](u,..,u)))
curve(C.diag, lty = 2, add = TRUE) # add true copula diagonal
legend("bottomright", lty = 2:1, bty = "n", inset = 0.02,
       legend = expression(C, C[n]))

## Approximate partial derivatives w.r.t. the 2nd and 3rd component
j.ind <- 2:3 # indices w.r.t. which the partial derivatives are computed
## Partial derivatives based on the empirical copula and the true copula
der23 <- dCn(u, U = pobs(U), j.ind = j.ind)
der23. <- copula:::dCdu(archmCopula(family, param=theta, dim=d), u=u)[,j.ind]
## Approximation error
summary(as.vector(abs(der23-der23.)))
# }
# NOT RUN {
## For an example of using F.n(), see help(mvdc)% ./Mvdc.Rd

## Generate a bivariate empirical copula object (various smoothing methods)
n <- 10 # sample size
d <- 2 # dimension
set.seed(271)
X <- rCopula(n, copula = claytonCopula(3, dim = d))
ecop.orig  <- empCopula(X) # smoothing = "none"
ecop.beta  <- empCopula(X, smoothing = "beta")
ecop.check <- empCopula(X, smoothing = "checkerboard")

## Plot
wireframe2(ecop.orig,  FUN = pCopula, draw.4.pCoplines = FALSE)
wireframe2(ecop.beta,  FUN = pCopula)
wireframe2(ecop.check, FUN = pCopula)
## Density (only exists when smoothing = "beta")
wireframe2(ecop.beta,  FUN = dCopula)

## Transform a copula sample to empirical margins
set.seed(271)
X <- qexp(rCopula(1000, copula = claytonCopula(2))) # multivariate distribution
U <- rCopula(917, copula = gumbelCopula(2)) # new copula sample
X. <- toEmpMargins(U, x = X) # tranform U to the empirical margins of X
plot(X.) # Gumbel sample with empirical margins of X
# }

Run the code above in your browser using DataLab