pcalg (version 2.5-0)

udag2apag: Last step of RFCI algorithm: Transform partially oriented graph into RFCI-PAG

Description

This function performs the last step of the RFCI algorithm: It transforms a partially oriented graph in which the v-structures have been oriented into an RFCI Partial Ancestral Graph (PAG) (see Colombo et al (2012)).

While orienting the edges, this function performs some additional conditional independence tests in orientation rule 4 to ensure correctness of the ancestral relationships. As a result of these additional tests, some additional edges can be deleted. The result is the final adjacency matrix indicating also the edge marks and the updated sepsets.

Usage

udag2apag(apag, suffStat, indepTest, alpha, sepset,
          rules = rep(TRUE, 10), unfVect = NULL, verbose = FALSE)

Arguments

apag

Adjacency matrix of type amat.pag

suffStat

Sufficient statistics: A list containing all necessary elements for the conditional independence decisions in the function indepTest.

indepTest

Pre-defined function for testing conditional independence. The function is internally called as indepTest(x,y,S,suffStat), and tests conditional independence of x and y given S. Here, x and y are variables, and S is a (possibly empty) set of variables (all variables are coded by their column numbers in the adjacency matrix). suffStat is a list containing all relevant elements for the conditional independence decisions. The return value of indepTest is the p-value of the test for conditional independence.

alpha

Significance level for the individual conditional independence tests.

sepset

List of length p; each element of the list contains another list of length p. The element sepset[[x]][[y]] contains the separation set that made the edge between x and y drop out. Each separation set is a vector with (integer) positions of variables in the adjacency matrix. This object is thought to be obtained from a pcAlgo-object.

rules

Logical vector of length 10 with TRUE or FALSE for each rule, where TRUE in position i means that rule i (Ri) will be applied. By default, all rules are active.

unfVect

Vector containing numbers that encode the ambiguous triples (as returned by pc.cons.intern). This is needed in the conservative and in the majority rule versions of RFCI.

verbose

Logical indicating if detailed output is to be given.

Value

apag

Final adjacency matrix of type amat.pag

sepset

Updated list of separating sets

Details

The partially oriented graph in which the v-structures have been oriented is transformed into an RFCI-PAG using adapted rules of Zhang (2008). This function is similar to udag2pag used to orient the skeleton into a PAG in the FCI algorithm. However, it is slightly more complicated because we perform additional conditional independence tests when applying rule 4, to ensure correctness of the ancestral relationships. As a result, some additional edges can be deleted, see Colombo et al. (2012). Because of these addiitonal tests, we need to give suffStat, indepTest, and alpha as inputs. Since edges can be deleted, the input adjacency matrix apag and the input separating sets sepset can change in this algorithm.

If unfVect = NULL (no ambiguous triples), the orientation rules are applied to each eligible structure until no more edges can be oriented. On the other hand, hand, if one uses conservative or majority rule FCI and ambiguous triples have been found in pc.cons.intern, unfVect contains the numbers of all ambiguous triples in the graph. In this case, the orientation rules take this information into account. For example, if a *-> b o-* c and <a,b,c> is an unambigous unshielded triple and not a v-structure, then we obtain b -* c (otherwise we would create an additional v-structure). On the other hand, if a *-> b o-* c but <a,b,c> is an ambiguous unshielded triple, then the circle mark at b is not oriented.

Note that the algorithm works with columns' position of the adjacency matrix and not with the names of the variables.

Note that this function does not resolve possible order-dependence in the application of the orientation rules, see Colombo and Maathuis (2014).

References

D. Colombo and M.H. Maathuis (2014).Order-independent constraint-based causal structure learning. Journal of Machine Learning Research 15 3741-3782.

D. Colombo, M. H. Maathuis, M. Kalisch, T. S. Richardson (2012). Learning high-dimensional directed acyclic graphs with latent and selection variables. Ann. Statist. 40, 294--321.

J. Zhang (2008). On the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias. Artificial Intelligence 172, 1873--1896.

See Also

rfci, udag2pag, dag2pag, udag2pdag, udag2pdagSpecial, udag2pdagRelaxed

Examples

Run this code
# NOT RUN {
<!-- %% first part is  *identical* to >>>>>>>> ./rfci.Rd and ./udag2pag.Rd -->
# }
# NOT RUN {
##################################################%  --------       -----------
## Example with hidden variables
## Zhang (2008), Fig. 6, p.1882
##################################################

## create the DAG :
amat <- t(matrix(c(0,1,0,0,1, 0,0,1,0,0, 0,0,0,1,0, 0,0,0,0,0, 0,0,0,1,0),5,5))
V <- LETTERS[1:5]
colnames(amat) <- rownames(amat) <- V
edL <- setNames(vector("list",length=5), V) 
# }
# NOT RUN {
<!-- %% MM{FIXME}: nicer as in ./jointIda.Rd -->
# }
# NOT RUN {
edL[[1]] <- list(edges= c(2,4),weights=c(1,1))
edL[[2]] <- list(edges= 3,     weights=c(1))
edL[[3]] <- list(edges= 5,     weights=c(1))
edL[[4]] <- list(edges= 5,     weights=c(1))
## and leave  edL[[ 5 ]] empty
g <- new("graphNEL", nodes=V, edgeL=edL, edgemode="directed")
if (require(Rgraphviz))
  plot(g)

## define the latent variable
L <- 1

## compute the true covariance matrix of g
cov.mat <- trueCov(g)

## delete rows and columns belonging to latent variable L
true.cov <- cov.mat[-L,-L]

## transform covariance matrix into a correlation matrix
true.corr <- cov2cor(true.cov)

# }
# NOT RUN {
<!-- %% Here, ./rfci.Rd diverts and calls rfci() ... -->
# }
# NOT RUN {
n <- 100000
alpha <- 0.01
p <- ncol(true.corr)

if (require("MASS")) {
  ## generate 100000 samples of DAG using standard normal error distribution
  set.seed(289)
  d.mat <- mvrnorm(n, mu = rep(0, p), Sigma = true.cov)

  ## estimate the skeleton of given data
  suffStat <- list(C = cor(d.mat), n = n)
  indepTest <- gaussCItest
  resD <- skeleton(suffStat, indepTest, alpha = alpha, labels=colnames(true.corr))

  ## estimate all ordered unshielded triples
  amat.resD <- as(resD@graph, "matrix")
  print(u.t <- find.unsh.triple(amat.resD)) # four of them

  ## check and orient v-structures
  vstrucs <- rfci.vStruc(suffStat, indepTest, alpha=alpha,
			 sepset = resD@sepset, g.amat = amat.resD,
			 unshTripl= u.t$unshTripl, unshVect = u.t$unshVect,
			 verbose = TRUE)

  ## Estimate the final skeleton and extend it into a PAG
  ## (using all 10 rules, as per default):
  resP <- udag2apag(vstrucs$amat, suffStat, indepTest=indepTest, alpha=alpha,
		    sepset=vstrucs$sepset, verbose = TRUE)
  print(Amat <- resP$graph)
} # only if "MASS" is there

# }

Run the code above in your browser using DataCamp Workspace