indepTest: Test Independence of Continuous Random Variables via Empirical Copula

Description

Multivariate independence test based on the empirical copula process as proposed by Christian Genest and Bruno R<U+00E9>millard. The test can be seen as composed of three steps: (i) a simulation step, which consists of simulating the distribution of the test statistics under independence for the sample size under consideration; (ii) the test itself, which consists of computing the approximate p-values of the test statistics with respect to the empirical distributions obtained in step (i); and (iii) the display of a graphic, called a dependogram, enabling to understand the type of departure from independence, if any. More details can be found in the articles cited in the reference section.

Usage

indepTestSim(n, p, m = p, N = 1000, verbose = interactive())
indepTest(x, d, alpha=0.05)
dependogram(test, pvalues = FALSE, print = FALSE)

Arguments

sample size when simulating the distribution of the test statistics under independence.

dimension of the data when simulating the distribution of the test statistics under independence.

maximum cardinality of the subsets of variables for which a test statistic is to be computed. It makes sense to consider \(m \ll p\) especially when p is large.

number of repetitions when simulating under independence.

verbose

a logical specifying if progress should be displayed via txtProgressBar.

data frame or data matrix containing realizations (one per line) of the random vector whose independence is to be tested.

object of class "indepTestDist" as returned by the function indepTestSim(). It can be regarded as the empirical distribution of the test statistics under independence.

alpha

significance level used in the computation of the critical values for the test statistics.

test

object of class "indepTest" as returned by indepTest().

pvalues

logical indicating whether the dependogram should be drew from test statistics or the corresponding p-values.

logical indicating whether details should be printed.

Value

The function indepTestSim() returns an object of class "indepTestDist" whose attributes are: sample.size, data.dimension, max.card.subsets, number.repetitons, subsets (list of the subsets for which test statistics have been computed), subsets.binary (subsets in binary 'integer' notation), dist.statistics.independence (a N line matrix containing the values of the test statistics for each subset and each repetition) and dist.global.statistic.independence (a vector a length N containing the values of the global Cram<U+00E9>r-von Mises test statistic for each repetition -- see Genest et al (2007), p.175).

The function indepTest() returns an object of class "indepTest" whose attributes are: subsets, statistics, critical.values, pvalues, fisher.pvalue (a p-value resulting from a combination <U+00E0> la Fisher of the subset statistic p-values), tippett.pvalue (a p-value resulting from a combination <U+00E0> la Tippett of the subset statistic p-values), alpha (global significance level of the test), beta (1 - beta is the significance level per statistic), global.statistic (value of the global Cram<U+00E9>r-von Mises statistic derived directly from the independence empirical copula process - see Genest et al (2007), p.175) and global.statistic.pvalue (corresponding p-value).

Details

The current (C code) implementation of indepTestSim() uses (RAM) memory of size \(O(n^2 p)\), and time \(O(N n^2 p)\). This renders it unfeasible when n is large.

See the references below for more details, especially Genest and R<U+00E9>millard (2004).

The former argument print.every is deprecated and not supported anymore; use verbose instead.

References

Deheuvels, P. (1979). La fonction de d<U+00E9>pendance empirique et ses propri<U+00E9>t<U+00E9>s: un test non param<U+00E9>trique d'ind<U+00E9>pendance, Acad. Roy. Belg. Bull. Cl. Sci., 5th Ser. 65, 274--292.

Deheuvels, P. (1981) A non parametric test for independence, Publ. Inst. Statist. Univ. Paris. 26, 29--50.

Genest, C. and R<U+00E9>millard, B. (2004) Tests of independence and randomness based on the empirical copula process. Test 13, 335--369.

Genest, C., Quessy, J.-F., and R<U+00E9>millard, B. (2006). Local efficiency of a Cramer-von Mises test of independence, Journal of Multivariate Analysis 97, 274--294.

Genest, C., Quessy, J.-F., and R<U+00E9>millard, B. (2007) Asymptotic local efficiency of Cram<U+00E9>r-von Mises tests for multivariate independence. The Annals of Statistics 35, 166--191.

Examples

Run this code

# NOT RUN {
## Consider the following example taken from
## Genest and Remillard (2004), p 352:

set.seed(2004)
x <- matrix(rnorm(500),100,5)
x[,1] <- abs(x[,1]) * sign(x[,2] * x[,3])
x[,5] <- x[,4]/2 + sqrt(3) * x[,5]/2

## In order to test for independence "within" x, the first step consists
## in simulating the distribution of the test statistics under
## independence for the same sample size and dimension,
## i.e. n=100 and p=5. As we are going to consider all the subsets of
## {1,...,5} whose cardinality is between 2 and 5, we set p=m=5.

## For a realistic N = 1000 (default), this takes a few seconds:
N. <- if(copula:::doExtras()) 1000 else 120
N.
system.time(d <- indepTestSim(100, 5, N = N.))
## For N=1000,  2 seconds (lynne 2015)
## You could save 'd' for future use, via  saveRDS()

## The next step consists of performing the test itself (and print its results):
(iTst <- indepTest(x,d))

## Display the dependogram with the details:
dependogram(iTst, print=TRUE)

## We could have tested for a weaker form of independence, for instance,
## by only computing statistics for subsets whose cardinality is between 2
## and 3. Consider for instance the following data:
y <- matrix(runif(500),100,5)
## and perform the test:
system.time( d <- indepTestSim(100,5,3, N=N.) )
iTy <- indepTest(y,d)
iTy
dependogram(iTy, print=TRUE)

# }

Run the code above in your browser using DataLab