Learn R Programming

pcalg (version 1.1-4)

skeleton: Estimate the skeleton of a DAG using the PC Algorithm

Description

Estimate the skeleton of a directed acyclic graph (DAG) from observational data, using the PC-algorithm.

Usage

skeleton(suffStat, indepTest, p, alpha, verbose = FALSE, fixedGaps = NULL,
fixedEdges = NULL, NAdelete = TRUE, m.max = Inf)

Arguments

suffStat
Sufficient statistics: List containing all necessary elements for the conditional independence decisions in the function indepTest.
indepTest
Predefined function for testing conditional independence. The function is internally called as indepTest(x,y,S,suffStat), and tests conditional independence of x and y given S. Here, x<
p
Number of variables.
alpha
Significance level for the individual conditional independence tests.
verbose
If TRUE, detailed output is provided.
fixedGaps
A logical matrix of dimension p*p. If entry [i,j] or [j,i] (or both) are TRUE, the edge i-j is removed before starting the algorithm. Therefore, this edge is guaranteed to be absent in the resulting graph.
fixedEdges
A logical matrix of dimension p*p. If entry [i,j] or [j,i] (or both) are TRUE, the edge i-j is never considered for removal. Therefore, this edge is guaranteed to be present in the resulting graph.
NAdelete
If indepTest returns NA and this option is TRUE, the corresponding edge is deleted. If this option is FALSE, the edge is not deleted.
m.max
Maximal size of the conditioning sets that are considered in the conditional independence tests.

Value

  • An object of class "pcAlgo" (see pcAlgo) containing an estimate of the skeleton of the underlying DAG, the conditioning sets that led to edge removals (sepset) and several other parameters.

Details

Under the assumption that the distribution of the observed variables is faithful to a DAG, this function estimates the skeleton of the DAG. The skeleton of a DAG is the undirected graph resulting from removing all arrowheads from the DAG. Edges in the skeleton of a DAG have the following interpretation: there is an edge between i and j if and only if variables i and j are conditionally dependent given S for all possible subsets S of the remaining nodes.

The data are not required to follow a specific distribution, but one should make sure that the conditional indepedence test used in indepTest is appropriate for the data. Pre-programmed versions of indepTest are available for Gaussian data (gaussCItest), discrete data (disCItest), and binary data (see binCItest). Users can also specify their own indepTest function.

The PC algorithm (Spirtes, Glymour and Scheines, 2000) starts with a complete undirected graph. In each step, it visits all pairs (i,j) of adjacent nodes in the current graph, and determines based on conditional independence tests whether the edge i-j should be removed. In particular, in step m (m=0,1,...) the algorithm visits all pairs (i, j) of adjacent nodes in the current graph, and the edge between i and j is kept if and only if the null hypothesis "i and j are conditionally independent given S" is rejected at significance level alpha for all subsets S of size m of the neighbours of i and of the neighbors of j in the current graph (as judged by the function indepTest). The algorithm stops when m is larger than the largest neighbourhood size of all nodes, or when m has reached the limit m.max defined by the user.

The information in fixedGaps and fixedEdges is used as follows. The gaps given in fixedGaps are introduced in the very beginning of the algorithm by removing the corresponding edges from the complete undirected graph. Pairs (i,j) in fixedEdges are skipped in all steps of the algorithm, so that these edges remain in the graph.

Note: Throughout, the algorithm works with the column positions of the variables in the adjacency matrix, and not with the names of the variables.

References

P. Spirtes, C. Glymour and R. Scheines (2000) Causation, Prediction, and Search, 2nd edition, MIT Press.

M. Kalisch and P. Buehlmann (2007) Estimating high-dimensional directed acyclic graphs with the PC-algorithm, JMLR 8 613-636.

See Also

pc for generating a partially directed graph using the PC algorithm; udag2pdag for converting the skeleton to a CPDAG; gaussCItest, disCItest, binCItest and dsepTest as examples for indepTest.

Examples

Run this code
##################################################
## Using Gaussian Data
##################################################
## Load predefined data
data(gmG)
n <- nrow(gmG$x)
p <- ncol(gmG$x)

## define independence test (partial correlations)
indepTest <- gaussCItest
## define sufficient statistics
suffStat <- list(C = cor(gmG$x), n = n)
## estimate Skeleton
alpha <- 0.01
skeleton.fit <- skeleton(suffStat, indepTest, p, alpha)
if (require(Rgraphviz)) {
## show estimated Skeleton
par(mfrow=c(1,2))
plot(skeleton.fit, main = "Estimated Skeleton")
plot(gmG$g, main = "True DAG")
}
##################################################
## Using d-separation oracle
##################################################
## define independence test
indepTest <- dsepTest
## define sufficient statistics (d-separation oracle)
suffStat <- list(g = gmG$g, jp = RBGL::johnson.all.pairs.sp(gmG$g))
## estimate Skeleton
alpha <- 0.01 ## value is irrelevant as dsepTest returns either 0 or 1
fit <- skeleton(suffStat, indepTest, p, alpha)
if (require(Rgraphviz)) {
## show estimated Skeleton
plot(fit, main = "Estimated Skeleton")
plot(gmG$g, main = "True DAG")
}
##################################################
## Using discrete data
##################################################
## Load data
data(gmD)
p <- ncol(gmD$x)
## define independence test (G^2 statistics)
indepTest <- disCItest
## define sufficient statistics
suffStat <- list(dm = gmD$x, nlev = c(3,2,3,4,2), adaptDF = FALSE)
## estimate Skeleton
alpha <- 0.01
skeleton.fit <- skeleton(suffStat, indepTest, p, alpha, verbose = TRUE)
if (require(Rgraphviz)) {
## show estimated Skeleton
par(mfrow = c(1,2))
plot(skeleton.fit, main = "Estimated Skeleton")
plot(gmD$g, main = "True DAG")
}

##################################################
## Using binary data
##################################################
## Load binary data
data(gmB)
p <- ncol(gmB$x)
## define independence test
indepTest <- binCItest
## define sufficient statistics
suffStat <- list(dm = gmB$x, adaptDF = FALSE)
## estimate Skeleton
alpha <- 0.01
skeleton.fit <- skeleton(suffStat, indepTest, p, alpha, verbose = TRUE)
if (require(Rgraphviz)) {
## show estimated Skeleton
plot(skeleton.fit, main = "Estimated Skeleton")
plot(gmB$g, main = "True DAG")
}

Run the code above in your browser using DataLab