
If used in the PC algorithm, it estimates the order-independent
"stable"
) or original PC ("original"
)
When used in the FCI and RFCI algorithms, this function estimates
only an initial order-independent (or PC original)
skeleton(suffStat, indepTest, alpha, labels, p,
method = c("stable", "original", "stable.fast"), m.max = Inf,
fixedGaps = NULL, fixedEdges = NULL, NAdelete = TRUE,
verbose = FALSE)
indepTest
.function
for testing
conditional independence. The function is internally called as
indepTest(x,y,S,suffStat)
and tests conditional independence
of x
and <p
.labels
are not, in which case labels
is set to
1:p
."stable"
provides an order-independent skeleton, see
[i,j]
is true, the edge $i--j$ is removed before
starting the algorithm. Therefore, this edge is guaranteed to be
absent in the resulting graph.[i,j]
is true, the edge $i--j$ is never considered
for removal. Therefore, this edge is guaranteed to be present in
the resulting graph.indepTest(*)
returns NA
. If it is true, the corresponding edge is
deleted, otherwise not.TRUE
, detailed output is provided.class
"pcAlgo"
(see
pcAlgo
) containing an estimate of the skeleton of
the underlying DAG, the conditioning sets (sepset
) that led to
edge removals and several other parameters. On the other hand, the distribution of the observed variables
is faithful to a DAG with arbitrarily many latent and selection
variables, skeleton()
estimates the initial skeleton of the
DAG. Edges in this initial skeleton of a DAG have the
following interpretation:
There is an edge $i$ -- $j$ if and only if variables $i$ and
$j$ are conditionally dependent given $S$ for all possible
subsets $S$ of the neighbours of $i$ and the neighbours of
$j$.
The data are not required to follow a specific distribution,
but one should make sure that the conditional indepedence test used in
indepTest
is appropriate for the data. Pre-programmed versions
of indepTest
are available for Gaussian data
(gaussCItest
), discrete data (disCItest
),
and binary data (see binCItest
). Users may also specify
their own indepTest
function.
The PC algorithm (Spirtes, Glymour and Scheines, 2000)
(method = "original"
) is known to be order-dependent, in the
sense that the output may depend on the order in which the variables
are given. Therefore, Colombo and Maathuis (2013) proposed a simple
modification, called pc()
with the new default method = "stable"
. This stable variant
of the algorithm is also available with the method = "stable.fast"
:
it runs the algorithm of Colombo and Maathuis (2013) faster than
method = "stable"
in general, but should be regarded as an
experimental option at the moment.
The algorithm starts with a complete undirected graph. In each
step, it visits all pairs $(i,j)$ of adjacent nodes in the
current graph, and determines based on conditional independence tests
whether the edge $i--j$ should be removed. In particular, for each step
$m$ ($m=0,1,\dots$) of the size of the conditioning sets, the
algorithm at first determines the neighbours $a(i)$ of each node
$i$ in the graph. Then, the algorithm visits all pairs $(i,j)$
of adjacent nodes in the current graph, and the edge $i--j$ is
kept if and only if the null hypothesis
$i$ and $j$ are conditionally independent given S
rejected at significance level alpha
for all subsets $S$ of size
$m$ of $a(i)$ and of $a(j)$ (as judged by the function
indepTest
). For the "stable"
method, the neighborhoods
$a(i)$ are kept fixed within each value of $m$, and this
makes the algorithm order-independent. Method "original"
,
the original PC algorithm would update the neighbour list after each
edge change.
The algorithm stops when $m$ is larger than the largest
neighbourhood size of all nodes, or when $m$ has reached the limit
m.max
which may be set by the user.
Since the FCI (Spirtes, Glymour and Scheines, 2000) and RFCI (Colombo
et al., 2012) algorithms are built up from the PC algorithm, they are also
order-dependent in the skeleton. To resolve their order-dependence
issues in the skeleton is more involved, see Colombo and Maathuis
(2013). However now, with method = "stable"
, this function
estimates an initial order-independent skeleton in these algorithms
(for additional details on how to make the final skeleton of FCI fully
order-independent see fci
and Colombo and Maathuis (2013)).
The information in fixedGaps
and fixedEdges
is used as follows.
The gaps given in fixedGaps
are introduced in the very beginning of
the algorithm by removing the corresponding edges from the complete
undirected graph. Pairs $(i,j)$ in fixedEdges
are skipped
in all steps of the algorithm, so that these edges remain in the graph. Note: Throughout, the algorithm works with the column positions of
the variables in the adjacency matrix, and not with the names of
the variables.
D. Colombo, M. H. Maathuis, M. Kalisch, T. S. Richardson (2012). Learning high-dimensional directed acyclic graphs with latent and selection variables. Ann. Statist. 40, 294-321.
M. Kalisch and P. Buehlmann (2007). Estimating high-dimensional directed acyclic graphs with the PC-algorithm, JMLR 8 613-636.
P. Spirtes, C. Glymour and R. Scheines (2000). Causation, Prediction, and Search, 2nd edition, MIT Press.
pc
for generating a partially directed graph
using the PC algorithm; fci
for generating a partial
ancestral graph using the FCI algorithm; rfci
for
generating a partial ancestral graph using the RFCI algorithm;
udag2pdag
for converting the skeleton to a CPDAG. Further, gaussCItest
, disCItest
,
binCItest
and dsepTest
as examples for
indepTest
.
##################################################
## Using Gaussian Data
##################################################
## Load predefined data
data(gmG)
n <- nrow (gmG8$x)
V <- colnames(gmG8$x) # labels aka node names
## estimate Skeleton
skel.fit <- skeleton(suffStat = list(C = cor(gmG8$x), n = n),
indepTest = gaussCItest, ## (partial correlations)
alpha = 0.01, labels = V, verbose = TRUE)
if (require(Rgraphviz)) {
## show estimated Skeleton
par(mfrow=c(1,2))
plot(skel.fit, main = "Estimated Skeleton")
plot(gmG8$g, main = "True DAG")
}
##################################################
## Using d-separation oracle
##################################################
## define sufficient statistics (d-separation oracle)
Ora.stat <- list(g = gmG8$g, jp = RBGL::johnson.all.pairs.sp(gmG8$g))
## estimate Skeleton
fit.Ora <- skeleton(suffStat=Ora.stat, indepTest = dsepTest, labels = V,
alpha=0.01) # <- irrelevant as dsepTest returns either 0 or 1
if (require(Rgraphviz)) {
## show estimated Skeleton
plot(fit.Ora, main = "Estimated Skeleton (d-sep oracle)")
plot(gmG8$g, main = "True DAG")
}
##################################################
## Using discrete data
##################################################
## Load data
data(gmD)
V <- colnames(gmD$x) # labels aka node names
## define sufficient statistics
suffStat <- list(dm = gmD$x, nlev = c(3,2,3,4,2), adaptDF = FALSE)
## estimate Skeleton
skel.fit <- skeleton(suffStat,
indepTest = disCItest, ## (G^2 statistics independence test)
alpha = 0.01, labels = V, verbose = TRUE)
if (require(Rgraphviz)) {
## show estimated Skeleton
par(mfrow = c(1,2))
plot(skel.fit, main = "Estimated Skeleton")
plot(gmD$g, main = "True DAG")
}
##################################################
## Using binary data
##################################################
## Load binary data
data(gmB)
X <- gmB$x
## estimate Skeleton
skel.fm2 <- skeleton(suffStat = list(dm = X, adaptDF = FALSE),
indepTest = binCItest, alpha = 0.01,
labels = colnames(X), verbose = TRUE)
if (require(Rgraphviz)) {
## show estimated Skeleton
par(mfrow = c(1,2))
plot(skel.fm2, main = "Binary Data 'gmB': Estimated Skeleton")
plot(gmB$g, main = "True DAG")
}
Run the code above in your browser using DataLab