C.n: The Empirical Copula

Description

Given pseudo-observations from a distribution with continuous margins and copula C, the empirical copula is the empirical distribution function of these pseudo-observations. It is thus a natural nonparametric estimator of C. The function C.n() computes the empirical copula or two alternative smoothed versions of the latter: the empirical beta copula or the empirical checkerboard copula; see Eqs. (2.1) and (4.1) in Segers, Sibuya and Tsukahara (2017), and the references therein.

The function dCn() approximates first-order partial derivatives of the unknown copula using the empirical copula.

The function F.n() computes the empirical distribution function of a multivariate sample. Note that C.n(u, X, smoothing="none", *) simply calls F.n(u, pobs(X), *) after checking u.

Usage

C.n(u, X, smoothing = c("none", "beta", "checkerboard"),
    offset = 0, method = c("C", "R"),
    ties.method = c("max", "average", "first", "last", "random", "min"))
dCn(u, U, j.ind=1:d, b=1/sqrt(nrow(U)), ...)
F.n(x, X, offset=0, method=c("C", "R"))
Cn(x, w) ## <-- deprecated!  use  C.n(w, x) instead!

Arguments

u,w

an $(m, d)$-matrix with elements in $[0,1]$ whose rows contain the evaluation points of the empirical copula.

an $(m, d)$-matrix whose rows contain the evaluation points of the empirical distribution function.

for dCN() only: an $(n,d)$-matrix with elements in $[0,1]$ and with the same number $d$ of columns as u. The rows of U are the pseudo-observations based on which the empirical copula is computed.

(and x and U for Cn():) an $(n, d)$-matrix with the same number $d$ of columns as x. Recall that a multivariate random sample X can be transformed to an appropriate U via pobs().

smoothing

character string specifying whether the empirical copula (smoothing="none"), the empirical beta copula (smoothing="beta") or the empirical checkerboard copula (smoothing="checkerboard") is computed.

ties.method

character string specifying how ranks should be computed if there are ties in any of the coordinate samples of x; passed to pobs.

j.ind

integer vector of indices $j$ between 1 and $d$ indicating the dimensions with respect to which first-order partial derivatives are approximated.

numeric giving the bandwidth for approximating first-order partial derivatives.

offset

used in scaling the result which is of the form sum(....)/(n+offset); defaults to zero.

method

character string indicating which method is applied to compute the empirical cumulative distribution function or the empirical copula. method="C" uses a an implementation in C, method="R" uses a pure R implementation.

…

additional arguments passed to dCn().

Value

C.n() returns the empirical copula at u or a smoothed version of the latter. F.n() returns the empirical distribution function of X evaluated at x.

dCn() returns a vector (length(j.ind) is 1) or a matrix (with number of columns equal to length(j.ind)), containing the approximated first-order partial derivatives of the unknown copula at u with respect to the arguments in j.ind.

Details

There are several asymptotically equivalent definitions of the empirical copula. As mentioned above, the empirical copula C.n(, smoothing = "none") is simply defined as the empirical distribution function computed from the pseudo-observations, that is, $$C_n(\bm{u})=\frac{1}{n}\sum_{i=1}^n\mathbf{1}_{\{\hat{\bm{U}}_i\le\bm{u}\}},$$ where $\hat{\bm{U}}_i$, $i\in\{1,\dots,n\}$, denote the pseudo-observations (rows in U) and $n$ the sample size. Internally, C.n(,smoothing = "none") is just a wrapper for F.n() and is expected to be fed with the pseudo-observations.

The approximation for the $j$th partial derivative of the unknown copula $C$ is implemented as, for example, in R<U+00E9>millard and Scaillet (2009), and given by $$\hat{\dot{C}}_{jn}(\bm{u})=\frac{C_n(u_1,..,u_{j-1},min(u_j+b,1),u_{j+1},..,u_d)-C_n(u_1,..,u_{j-1},max(u_j-b,0),u_{j+1},..,u_d)}{2b},$$ where $b$ denotes the bandwidth and $C_n$ the empirical copula.

References

R<U+00FC>schendorf, L. (1976). Asymptotic distributions of multivariate rank order statistics, Annals of Statistics 4, 912--923.

Deheuvels, P. (1979). La fonction de d<U+00E9>pendance empirique et ses propri<U+00E9>t<U+00E9>s: un test non param<U+00E9>trique d'ind<U+00E9>pendance, Acad. Roy. Belg. Bull. Cl. Sci., 5th Ser. 65, 274--292.

Deheuvels, P. (1981). A non parametric test for independence, Publ. Inst. Statist. Univ. Paris 26, 29--50.

R<U+00E9>millard, B. and Scaillet, O. (2009). Testing for equality between two copulas. Journal of Multivariate Analysis, 100(3), pages 377-386.

Segers, J., Sibuya, M. and Tsukahara, H. (2017). The Empirical Beta Copula. Journal of Multivariate Analysis, 155, pages 35--51, http://arxiv.org/abs/1607.04430.

Examples

Run this code

# NOT RUN {
## Generate data X (from a meta-Gumbel model with N(0,1) margins)
n <- 100
d <- 3
family <- "Gumbel"
theta <- 2
cop <- onacopulaL(family, list(theta=theta, 1:d))
set.seed(1)
X <- qnorm(rCopula(n, cop)) # meta-Gumbel data with N(0,1) margins

## Random points were to evaluate the empirical copula
u <- matrix(runif(n*d), n, d)
ec <- C.n(u, X)

## Compare the empirical copula with the true copula
pc <- pCopula(u, copula=cop)
mean(abs(pc - ec)) # ~= 0.012 -- increase n to decrease this error

## The same for the two smoothed versions
beta <- C.n(u, X, smoothing = "beta")
mean(abs(pc - beta))
check <- C.n(u, X, smoothing = "checkerboard")
mean(abs(pc - check))

## Compare the empirical copula with F.n(pobs())
U <- pobs(X) # pseudo-observations
stopifnot(identical(ec, F.n(u, X=pobs(U)))) # even identical

## Compare the empirical copula based on U at U with the Kendall distribution
## Note: Theoretically, C(U) ~ K, so K(C_n(U, U=U)) should approximately be U(0,1)
plot(pK(C.n(U, X), cop=cop@copula, d=d))

## Compare the empirical copula and the true copula on the diagonal
C.n.diag <- function(u) C.n(do.call(cbind, rep(list(u), d)), X=X) # diagonal of C_n
C.diag <- function(u) pCopula(do.call(cbind, rep(list(u), d)), cop) # diagonal of C
curve(C.n.diag, from=0, to=1, # empirical copula diagonal
      main=paste("True vs empirical diagonal of a", family, "copula"),
      xlab="u", ylab=quote("True C(u,..,u) and empirical"~C[n](u,..,u)))
curve(C.diag, lty=2, add=TRUE) # add true copula diagonal
legend("bottomright", lty=2:1, bty="n", inset=0.02,
       legend = expression(C, C[n]))

## Approximate partial derivatives w.r.t. the 2nd and 3rd component
j.ind <- 2:3 # indices w.r.t. which the partial derivatives are computed
## Partial derivatives based on the empirical copula and the true copula
der23 <- dCn(u, U=pobs(U), j.ind=j.ind)
der23. <- copula:::dCdu(archmCopula(family, param=theta, dim=d), u=u)[,j.ind]
## Approximation error
summary(as.vector(abs(der23-der23.)))
# }
# NOT RUN {
## For an example of using F.n(), see help(mvdc)% ./Mvdc.Rd
# }

Run the code above in your browser using DataLab