binCItest: G square Test for (Conditional) Independence of Binary Variables

Description

\(G^2\) test for (conditional) independence of binary variables \(X\) and \(Y\) given the (possibly empty) set of binary variables \(S\).

binCItest() is a wrapper of gSquareBin(), to be easily used in skeleton, pc and fci.

Usage

gSquareBin(x, y, S, dm, adaptDF = FALSE, n.min = 10*df, verbose = FALSE)
binCItest (x, y, S, suffStat)

Arguments

x,y

(integer) position of variable \(X\) and \(Y\), respectively, in the adjacency matrix.

(integer) positions of zero or more conditioning variables in the adjacency matrix.

data matrix (with \(\{0,1\}\) entries).

adaptDF

logical specifying if the degrees of freedom should be lowered by one for each zero count. The value for the degrees of freedom cannot go below 1.

n.min

the smallest \(n\) (number of observations, nrow(dm)) for which the G^2 test is computed; for smaller \(n\), independence is assumed (\(G^2 := 1\)) with a warning. The default is \(10 m\), where \(m\) is the degrees of freedom assuming no structural zeros, \(2^{|S|}\).

verbose

logical or integer indicating that increased diagnostic output is to be provided.

suffStat

a list with two elements, "dm", and "adaptDF" corresponding to the above two arguments of gSquareBin().

Value

The p-value of the test.

Details

The \(G^2\) statistic is used to test for (conditional) independence of X and Y given a set S (can be NULL). This function is a specialized version of gSquareDis which is for discrete variables with more than two levels.

References

R.E. Neapolitan (2004). Learning Bayesian Networks. Prentice Hall Series in Artificial Intelligence. Chapter 10.3.1

Examples

Run this code

# NOT RUN {
n <- 100
set.seed(123)
## Simulate *independent data of {0,1}-variables:
x <- rbinom(n, 1, pr=1/2)
y <- rbinom(n, 1, pr=1/2)
z <- rbinom(n, 1, pr=1/2)
dat <- cbind(x,y,z)

binCItest(1,3,2, list(dm = dat, adaptDF = FALSE)) # 0.36, not signif.
binCItest(1,3,2, list(dm = dat, adaptDF = TRUE )) # the same, here

## Simulate data from a chain of 3 variables: x1 -> x2 -> x3
set.seed(12)
b0 <- 0
b1 <- 1
b2 <- 1
n <- 10000
x1 <- rbinom(n, size=1, prob=1/2) ## = sample(c(0,1), n, replace=TRUE)

## NB:  plogis(u) := "expit(u)" := exp(u) / (1 + exp(u))
p2 <- plogis(b0 + b1*x1) ; x2 <- rbinom(n, 1, prob = p2) # {0,1}
p3 <- plogis(b0 + b2*x2) ; x3 <- rbinom(n, 1, prob = p2) # {0,1}

ftable(xtabs(~ x1+x2+x3))
dat <- cbind(x1,x2,x3)

## Test marginal and conditional independencies
gSquareBin(3,1,NULL,dat, verbose=TRUE)
gSquareBin(3,1, 2,  dat)
gSquareBin(1,3, 2,  dat) # the same
gSquareBin(1,3, 2,  dat, adaptDF=TRUE, verbose = 2)
# }
# NOT RUN {
<!-- %dont -->
# }

Run the code above in your browser using DataLab