condIndFisherZ: Test Conditional Independence of Gaussians via Fisher's Z

Description

Using Fisher's z-transformation of the partial correlation, test for zero partial correlation of sets of normally / Gaussian distributed random variables.

The gaussCItest() function, using zStat() to test for (conditional) independence between gaussian random variables, with an interface that can easily be used in skeleton, pc and fci.

Usage

condIndFisherZ(x, y, S, C, n, cutoff, verbose= )
zStat         (x, y, S, C, n)
gaussCItest   (x, y, S, suffStat)

Arguments

x,y,S

(integer) position of variable $X$, $Y$ and set of variables $S$, respectively, in the adjacency matrix. It is tested, whether X and Y are conditionally independent given the subset S of the remaining nodes.

Correlation matrix of nodes

Integer specifying the number of observations (“samples”) used to estimate the correlation matrix C.

cutoff

Numeric cutoff for significance level of individual partial correlation tests. Must be set to qnorm(1 - alpha/2) for a test significance level of alpha.

verbose

Logical indicating whether some intermediate output should be shown; currently not used.

suffStat

A list with two elements, "C" and "n", corresponding to the above arguments with the same name.

Value

zStat() gives a number $$Z = \sqrt{n - \left|S\right| - 3} \cdot \log((1+r)/(1-r))/2$$ which is asymptotically normally distributed under the null hypothesis of correlation 0.

condIndFisherZ() returns a logical $L$ indicating whether the “partial correlation of x and y given S is zero” could not be rejected on the given significance level. More intuitively and for multivariate normal data, this means: If TRUE then it seems plausible, that x and y are conditionally independent given S. If FALSE then there was strong evidence found against this conditional independence statement.

gaussCItest() returns the p-value of the test.

Details

For gaussian random variables and after performing Fisher's z-transformation of the partial correlation, the test statistic zStat() is (asymptotically for large enough n) standard normally distributed.

Partial correlation is tested in a two-sided hypothesis test, i.e., basically, condIndFisherZ(*) == abs(zStat(*)) > qnorm(1 - alpha/2). In a multivariate normal distribution, zero partial correlation is equivalent to conditional independence.

References

M. Kalisch and P. Buehlmann (2007). Estimating high-dimensional directed acyclic graphs with the PC-algorithm. JMLR 8 613-636.

Examples

Run this code

# NOT RUN {
set.seed(42)
## Generate four independent normal random variables
n <- 20
data <- matrix(rnorm(n*4),n,4)
## Compute corresponding correlation matrix
corMatrix <- cor(data)
## Test, whether variable 1 (col 1) and variable 2 (col 2) are
## independent given variable 3 (col 3) and variable 4 (col 4) on 0.05
## significance level
x <- 1
y <- 2
S <- c(3,4)
n <- 20
alpha <- 0.05
cutoff <- qnorm(1-alpha/2)
(b1 <- condIndFisherZ(x,y,S,corMatrix,n,cutoff))
   # -> 1 and 2 seem to be conditionally independent given 3,4

## Now an example with conditional dependence
data <- matrix(rnorm(n*3),n,3)
data[,3] <- 2*data[,1]
corMatrix <- cor(data)
(b2 <- condIndFisherZ(1,3,2,corMatrix,n,cutoff))
   # -> 1 and 3 seem to be conditionally dependent given 2

## simulate another dep.case: x -> y -> z
set.seed(29)
x <- rnorm(100)
y <- 3*x + rnorm(100)
z <- 2*y + rnorm(100)
dat <- cbind(x,y,z)

## analyze data
suffStat <- list(C = cor(dat), n = nrow(dat))
gaussCItest(1,3,NULL, suffStat) ## dependent [highly signif.]
gaussCItest(1,3,  2,  suffStat) ## independent | S
# }

Run the code above in your browser using DataLab