It is assumed that we have observational data that are
multivariate Gaussian and
faithful to the true (but unknown) underlying causal DAG
(without hidden variables).
Under these assumptions, this function estimates the multiset of possible
total causal effects of
x
on y
, where the total causal effect is
defined via Pearl's do-calculus as
E(Y|do(X=z+1))-E(Y|do(X=z))
(this value does not depend on z
,
since Gaussianity implies that conditional expectations are linear).We estimate a set of possible total causal effects instead of the unique
total causal effect, since it is
typically impossible to identify the latter when the
true underlying causal DAG is unknown (even with an infinite amount of data).
Conceptually, the method works as follows.
First, we estimate the equivalence class of DAGs that describe
the conditional independence relationships in the data,
using the function pc
(see the help file of this function).
For each DAG G in the equivalence class, we apply
Pearl's do-calculus to estimate the total causal effect of x
on y
. This can be done via a simple linear regression: if y
is
not a parent of x
, we take the regression coefficient of x
in the
regression lm(y ~ x + pa(x))
, where pa(x)
denotes the parents of
x
in the DAG G; if y
is a parent of x
in G, we set the
estimated causal effect to zero.
If the equivalence class contains k
DAGs, this will yield k
estimated total causal effects. Since we do not know which DAG is
the true causal DAG, we do not know which estimated total causal
effect of x
on y
is the correct one. Therefore, we return
the entire multiset of k
estimated effects (it is a multiset
rather than a set because it can contain duplicate values).
One can take summary measures of the multiset. For example, the minimum
absolute value provides a lower bound on the size of the true causal effect:
if the minimum absolute value of all values in
the multiset is larger than one, then we know that the size of the true
causal effect (up to sampling error) must be larger than one.
If method="global", the method as described above is carried out, where
all DAGs in the equivalene class of the estimated CPDAG graphEst
are computed using the function allDags
. This method
is suitable for small graphs (say, up to 10 nodes).
If method="local", we do not determine all DAGs in the equivalence class of
the CPDAG. Instead, we only consider the local neighborhood of x
in
the CPDAG. In particular, we consider all possible directions of
undirected edges that have x
as endpoint, such that no new v-structure
is created. For each such configuration,
we estimate the total causal effect of x
on y
as above, using
linear regression.
At first sight, it is not clear that such a local configuration
corresponds to a DAG in the equivalence class of the CPDAG, since it may be
impossible to direct the remaining undirected edges without creating a
directed cycle or a v-structure. However, Maathuis, Kalisch and
Buhlmann (2009) showed that there is
at least one DAG in the equivalence class for each such local configuration.
As a result, it follows that the
multisets of total causal effects of the
"global" and the "local" method have the same unique values. They may,
however, have different multiplicities.
For example, a CPDAG may represent eight
DAGs, and the global method may produce the multiset
{1.3, -0.5, 0.7, 1.3, 1.3, -0.5, 0.7, 0.7}.
The unique values in this set are -0.5, 0.7 and 1.3, and the
multiplicities are 2, 3 and 3. The local method, on the other hand, may
yield {1.3, -0.5, -0.5, 0.7}. The unique values
are gain -0.5, 0.7 and 1.3, but the multiplicities are now 2, 1 and
1. The fact that the unique values of the multisets of the "global" and "local"
method are identical implies that summary measures of the multiset that
only depend on the unique values (such as the minimum absolute value) can
be estimate by the local method.