Under the assumption that the distribution of the observed variables
is faithful to a DAG, this function estimates the equivalence class of
the DAG. We do not estimate the DAG itself, because this is typically
impossible (even with an infinite amount of data), since different
DAGs can describe the same conditional independence relationships.
Since all DAGs in an equivalence class describe the same conditional
independence relationships, they are equally valid ways to
describe the conditional dependence structure that was given as
input. All DAGs in an equivalence class have the same skeleton (i.e.,
the same adjacency information) and the same v-structures (see
definition below). However, the direction of some edges may be
undetermined, in the sense that they point one way in one DAG in the
equivalence class, while they point the other way in another DAG in
the equivalence class.
An equivalence class can be uniquely represented by a completed
partially directed acyclic graph (CPDAG). A CPDAG
contains undirected and directed edges. The edges have the following
interpretation: (i) there is a (directed or undirected) edge between i
and j if and only if variables i and j are conditionally dependent
given S for all possible subsets S of the remaining nodes; (ii) a directed
edge i->j means that this directed edge is present in all DAGs in the
equivalence class; (iii) an undirected edge i-j means that there is at least
one DAG in the equivalence class with edge i->j and there is at least one DAG
in the equivalence class with edge i<-j.
The CPDAG is estimated using the PC algorithm (named after its inventors
Peter Spirtes and Clark Glymour). The skeleton is
estimated by the function skeleton
(see the help file
of this function for details). Subsequently, as many edges as possible
are oriented. This is done in two steps.
First, the algorithm considers all triples
(a,b,c)
, where a
and
b
are adjacent, b
and c
are adjacent, but
a
and c
are not adjacent. For all such triples,
we direct both edges towards b (a->b<-c) if and only if b
was
not part of the conditioning set that made the edge between a
and c
drop out. These conditioning sets were saved in
sepset
. The structure a->b<-c is called a v-structure.
After determining all v-structures, there may still
be undirected edges. It may be possible to direct some of these edges, since
one can deduce that one of the two possible directions of the edge is
invalid because it introduces
a new v-structure or a directed cycle. Such edges are found by
repeatedly applying rules R1-R3 as given in Algorithm 2 of Kalisch
and B"uhlmann (2007). The algorithm stops if none of the rules is
applicable to the graph.
Sampling errors (or hidden variables) can lead to conflicting
information about edge directions. For example, one may find that
a-b-c and b-c-d should both be directed as v-structures. This gives
conflicting information about the edge b-c, since it should be
directed as b<-c in v-structure a->b<-c, while it should be directed
as b->c in v-structure b->c<-d. In such cases, we simply overwrite the
directions of the conflicting edge. In the example above this means
that we obtain a->b->c<-d if a-b-c was visited first, and a->b<-c<-d
if b-c-d was visited first.
Sampling errors or hidden variables can also lead to invalid CPDAGs,
meaning that there does not exist a DAG
that has the same skeleton and v-structures as the
graph found by the algorithm. An example of this is an undirected cycle
consisting of the
edges a-b-c-d and d-a. In this case it is impossible to direct the edges
without creating a cycle or a new v-structure. The option u2pd
specifies what should be done in such a situation. If the option is set to
"relaxed"
, the algorithm simply outputs the invalid CPDAG. If
the option is set to "rand"
,
all direction information is discarded and a random DAG is generated on the
skeleton. If the option is set to "retry"
, up to 100 combinations
of possible directions of the ambiguous edges are tried, and the first
combination that results in a valid CPDAG is chosen. If no valid combination
is found, an arbitrary DAG is generated on the skeleton
as in the option "rand".
The conservative PC algorithm ("conservative = TRUE"
) is a
slight variation of the PC algorithm. After the skeleton is computed,
all potential v-structures a-b-c are checked in the following way. We
test whether a and c are independent conditioning on any subset of the
neighbors of a or any subset of the neighbors of c. If b is in no such
conditioning set (and not in the original sepset - ???DIEGO: 2nd
argument, version 2) or in all such
conditioning sets (and in the original sepset), no further action
is taken and the usual PC is continued. If, however, b is in only some
conditioning sets, the triple a-b-c is marked 'unfaithful'. Moreover,
if in the conservative step there is no subset among the neighbors
that makes a and c independent, the triple is marked 'unfaithful'
(???DIEGO: 1st argument, version 2). An
unfaithful triple is not oriented as a v-structure. Furthermore, no
later orientation rule that needs to know whether a-b-c is a
v-structure or not is applied. (The internal function
pc.cons.internal
will be called with version.unf = c(2,2)
.)
Notes: (1) Throughout, the algorithm works with the column positions
of the variables in the adjacency matrix, and not with the names of
the variables. (2) When plotting the object, undirected and bidirected
edges are equivalent.