Learn R Programming

⚠️There's a newer version (0.4.1) of this package.Take me there.

SEMID Package

Purpose

This package offers a number of different functions for determining global and generic identifiability of path diagrams / mixed graphs and latent digraphs. The following sections highlight the primary ways in which the package can be used. Much of the package's functionality can be accessed through the wrapper function semID.

The MixedGraph class

To be able to implement the different algorithms described below we created a MixedGraph class using the R.oo package of

Bengtsson, H. (2003)The R.oo package - Object-Oriented Programming with References Using Standard R Code, Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003), ISSN 1609-395X, Hornik, K.; Leisch, F. & Zeileis, A. (eds.) URL https://www.r-project.org/conferences/DSC-2003/Proceedings/Bengtsson.pdf

This class can make it much easier to represent a mixed graph and run experiments with them. For instance we can create a mixed graph, plot it, and test if there is a half-trek system between two sets of vertices very easily:

> # Mixed graphs are specified by their directed adjacency matrix L and
> # bidirected adjacency matrix O.
> library(SEMID)
> L = t(matrix(
+ c(0, 1, 0, 0, 0,
+   0, 0, 0, 1, 1,
+   0, 0, 0, 1, 0,
+   0, 1, 0, 0, 1,
+   0, 0, 0, 1, 0), 5, 5))
>
> O = t(matrix(
+ c(0, 0, 0, 0, 0,
+   0, 0, 1, 0, 1,
+   0, 0, 0, 1, 0,
+   0, 0, 0, 0, 0,
+   0, 0, 0, 0, 0), 5, 5)); O=O+t(O)
>
> # Create the mixed graph object corresponding to L and O
> g = MixedGraph(L, O)
>
> # Plot the mixed graph
> g$plot()
>
> # Test whether or not there is a half-trek system from the nodes
> # 1,2 to 3,4
> g$getHalfTrekSystem(c(1,2), c(3,4))
$systemExists
[1] TRUE

$activeFrom
[1] 1 2

See the documentation for the MixedGraph class ?MixedGraph for more information.

Global Identifiability

Drton, Foygel, and Sullivant (2011) showed that there exist if and only if graphical conditions for testing whether or not the parameters in a mixed graph are globally identifiable. This criterion can be accessed through the function graphID.globalID.

Drton, M., Foygel, R., and Sullivant, S. (2011) Global identifiability of linear structural equation models. Ann. Statist. 39(2): 865-886.

Generic Identiability of Parameters

There still do not exist any 'if and only if' graphical conditions for testing whether or not certain parameters in a mixed graph are generically identifiable. There do, however, exist some necessary and some sufficient conditions which work for a large collection of graphs.

Sufficient Conditions

Until recently, criterions for generic identifiability, like the half-trek criterion of Foygel, Draisma, and Drton (2012), had to show that all edges incoming to a node where generically identifiable simultaenously and thus, if any single such edge incoming to a node was generically nonidentifiable, the criterion would fail. The recent work of Weihs, Robeva, Robinson, et al. (2017) develops new criteria that are able to identify subsets of edges coming into a node substantially improving upon prior methods at the cost of computational efficiency. We list both the older algorithms (available in prior versions of this package) and the newer algorithms below.

  • htcID - implements the half-trek critierion of

Foygel, Rina; Draisma, Jan; Drton, Mathias. Half-trek criterion for generic identifiability of linear structural equation models. Ann. Statist. 40 (2012), no. 3, 1682--1713. doi:10.1214/12-AOS1012.

and is an updated version of the (deprecated) function graphID.htcID.

  • ancestralID - implements the ancestor decomposition techniques of

Drton, M., and Weihs, L. (2016) Generic Identifiability of Linear Structural Equation Models by Ancestor Decomposition. Scand J Statist, 43: 1035–1045. doi: 10.1111/sjos.12227.

and is an updated version of the (deprecated) function graphID.ancestralID. This new version of the function works also on cyclic graphs by using an updated version of the Tian decomposition.

  • edgewiseID - an edgewise identification algorithm strictly

improving upon the half-trek criterion.

  • edgewiseTSID - an edgewise identification algorithm leveraging

trek-separation relations to identify even more edges than edgewiseID, this has the downside being computationally expensive.

  • generalGenericID - an generic identification algorithm template

that allows you to mix and match different identification techniques to find the right balance between computational efficiency and exhaustiveness of the procedure. See the below examples for more details.

Necessary Conditions

  • graphID.nonHtcID - the best known test for whether any edges in a

mixed graph are generically non-identifiabile. This comes Foygel, Draisma, and Drton (2012).

Examples

Lets use a few of the above functions to check the generic identifiability of parameters in a mixed graph.

> library(SEMID)
> # Mixed graphs are specified by their directed adjacency matrix L and
> # bidirected adjacency matrix O.
> L = t(matrix(
+ c(0, 1, 1, 0, 0,
+   0, 0, 1, 1, 1,
+   0, 0, 0, 1, 0,
+   0, 0, 0, 0, 1,
+   0, 0, 0, 0, 0), 5, 5))
>
> O = t(matrix(
+ c(0, 0, 0, 1, 0,
+   0, 0, 1, 0, 1,
+   0, 0, 0, 0, 0,
+   0, 0, 0, 0, 0,
+   0, 0, 0, 0, 0), 5, 5)); O=O+t(O)
>
> # Create a mixed graph object
> graph = MixedGraph(L, O)
>
> # Without using decomposition techniques we can't identify all nodes
> # just using the half-trek criterion
> htcID(graph, tianDecompose = F)
Call: htcID(mixedGraph = graph, tianDecompose = F)

Mixed Graph Info.
# nodes: 5 
# dir. edges: 7 
# bi. edges: 3 

Generic Identifiability Summary
# dir. edges shown gen. identifiable: 1 
# bi. edges shown gen. identifiable: 0 

Generically identifiable dir. edges:
1->2 

Generically identifiable bi. edges:
None
>
> # The edgewiseTSID function can show that all edges are generically
> # identifiable without proprocessing with decomposition techniques
> edgewiseTSID(graph, tianDecompose = F)
Call: edgewiseTSID(mixedGraph = graph, tianDecompose = F)

Mixed Graph Info.
# nodes: 5 
# dir. edges: 7 
# bi. edges: 3 

Generic Identifiability Summary
# dir. edges shown gen. identifiable: 7 
# bi. edges shown gen. identifiable: 3 

Generically identifiable dir. edges:
1->2, 1->3, 2->3, 2->4, 3->4, 2->5, 4->5 

Generically identifiable bi. edges:
1<->4, 2<->3, 2<->5 
>
> # The above shows that all edges in the graph are generically identifiable.
> # See the help of edgewiseTSID to find out more information about what
> # else is returned by edgewiseTSID.

Using the generalGenericId method we can also mix and match different identification strategies. Lets say we wanted to first try to identify everything using the half-trek criterion but then, if there are still things that cant be shown generically identifiable, we want to use the edgewise criterion by limiting the edgesets it looks at to be a small size. We can do this as follows:

> library(SEMID)
> # Lets first define some matrices for a mixed graph
> L = t(matrix(
+ c(0, 1, 0, 0, 0,
+   0, 0, 0, 1, 1,
+   0, 0, 0, 1, 0,
+   0, 1, 0, 0, 1,
+   0, 0, 0, 1, 0), 5, 5))
>
> O = t(matrix(
+ c(0, 0, 0, 0, 0,
+   0, 0, 1, 0, 1,
+   0, 0, 0, 1, 0,
+   0, 0, 0, 0, 0,
+   0, 0, 0, 0, 0), 5, 5)); O=O+t(O)
>
> # Create a mixed graph object
> graph = MixedGraph(L, O)
>
> # Now lets define an "identification step" function corresponding to
> # using the edgewise identification algorithm but with subsets
> # controlled by 1.
> restrictedEdgewiseIdentifyStep <- function(mixedGraph,
+                                            unsolvedParents,
+                                            solvedParents, 
+                                            identifier) {
+     return(edgewiseIdentifyStep(mixedGraph, unsolvedParents,
+                                 solvedParents, identifier, 
+                                 subsetSizeControl = 1))
+ }
>
> # Now we run an identification algorithm that iterates between the
> # htc and the "restricted" edgewise identification algorithm
> generalGenericID(graph, list(htcIdentifyStep,
+                               restrictedEdgewiseIdentifyStep),
+	                 tianDecompose = F)
Call: generalGenericID(mixedGraph = graph, idStepFunctions = list(htcIdentifyStep, 
    restrictedEdgewiseIdentifyStep), tianDecompose = F)

Mixed Graph Info.
# nodes: 5 
# dir. edges: 7 
# bi. edges: 3 

Generic Identifiability Summary
# dir. edges shown gen. identifiable: 2 
# bi. edges shown gen. identifiable: 0 

Generically identifiable dir. edges:
2->5, 4->5 

Generically identifiable bi. edges:
None
>
> # We can do better (fewer unsolved parents) if we don't restrict the edgewise 
> # identifier algorithm as much
> generalGenericID(graph, list(htcIdentifyStep, edgewiseIdentifyStep),
+                  tianDecompose = F)
Mixed Graph Info.
# nodes: 5 
# dir. edges: 7 
# bi. edges: 3 

Generic Identifiability Summary
# dir. edges shown gen. identifiable: 4 
# bi. edges shown gen. identifiable: 0 

Generically identifiable dir. edges:
2->4, 5->4, 2->5, 4->5 

Generically identifiable bi. edges:
None

Identifiability of latent factor graphs

The latent-factor half-trek criterion (LF-HTC) by Barber, Drton, Sturma and Weihs (2022) is a sufficient criterion to check generic identifiability in directed graphical models with explicitly modeled latent variables. We created a LatentDigraph class to represent such graphs.

> # Latent digraphs are specified by their directed adjacency matrix L
> library(SEMID)
> L = matrix(c(0, 1, 0, 0, 0, 0,
+              0, 0, 1, 0, 0, 0,
+              0, 0, 0, 0, 0, 0,
+              0, 0, 0, 0, 1, 0,
+              0, 0, 0, 0, 0, 0,
+              1, 1, 1, 1, 1, 0), 6, 6, byrow=TRUE)
> observedNodes = seq(1,5)
> latentNodes = c(6)
>
> # Create the latent digraph object corresponding to L
> g = LatentDigraph(L, observedNodes, latentNodes)
>
> # Plot latent digraph
> plot(g)

The function lfhtcID implements the algorithm to check LF-HTC-identifiability as presented in

Barber, R. F., Drton, M., Sturma, N., and Weihs L. (2022). Half-Trek Criterion for Identifiability of Latent Variable Models. arXiv preprint arXiv:2201.04457.

The LF-HTC is applicable to all graphs where the latent nodes are source nodes.

> lfhtcID(g)
Call: lfhtcID(graph = g)

Latent Digraph Info
# observed nodes: 5 
# latent nodes: 1 
# total nr. of edges between observed nodes: 3 

Generic Identifiability Summary
# nr. of edges between observed nodes shown gen. identifiable: 3 
# gen. identifiable edges: 1->2, 2->3, 4->5

Copy Link

Version

Install

install.packages('SEMID')

Monthly Downloads

263

Version

0.4.0

License

GPL (>= 2)

Issues

Pull Requests

Stars

Forks

Maintainer

Nils Sturma

Last Published

January 13th, 2022

Functions in SEMID (0.4.0)

LatentDigraphFixedOrder

Construct LatentDigraphFixedOrder object
LatentDigraph

Construct a LatentDigraph object
createLFIdentifierBaseCase

Create an latent identifier base case
createSimpleBiDirIdentifier

Identify bidirected edges if all directed edges are identified
generalGenericID

A general generic identification algorithm template.
children

All children of a collection of observed nodes.
createAncestralIdentifier

Create an ancestral identification function.
getAncestors

Get getAncestors of nodes in a graph.
edgewiseTSID

Determines which edges in a mixed graph are edgewiseID+TS identifiable
flowBetween

Flow from one set of nodes to another.
descendants

Get descendants of a collection of observed nodes
getParents

Get getParents of nodes in a graph.
getMixedGraph

Get the corresponding mixed graph
createTrekSeparationIdentifier

Create an trek separation identification function
graphID

Identifiability of linear structural equation models.
O

Get adjacency matrix for bidirected part.
MixedGraph

Construct MixedGraph object
graphID.ancestralID

Determine generic identifiability of an acyclic mixed graph using ancestral decomposition.
ancestralID

Determines which edges in a mixed graph are ancestralID-identifiable
createTrekFlowGraph

Helper function to create a flow graph.
print.LfhtcIDResult

Prints a LfhtcIDResult object
ancestralIdentifyStep

Perform one iteration of ancestral identification.
createTrGraph

Helper function to create a graph encoding trek reachable relationships.
getSiblings

Get getSiblings of nodes in a graph.
getHalfTrekSystem

Determines if a half-trek system exists in the mixed graph.
getTrekSystem

Determines if a trek system exists in the mixed graph.
getDescendants

Get descendants of nodes in a graph.
htcID

Determines which edges in a mixed graph are HTC-identifiable.
numObserved

Number of observed nodes in the graph.
getMaxFlow

Size of largest HT system Y satisfying the HTC for a node v except perhaps having |getParents(v)| < |Y|.
toIn

Transforms a vector of given node indices into their internal numbering
SEMID-package

SEMID package documentation.
createIdentifierBaseCase

Create an identifier base case
htcIdentifyStep

Perform one iteration of HTC identification.
isSibling

Are two nodes siblings?
plot.LatentDigraph

Plots the latent digraph
createLFHtcIdentifier

Create a latent-factor half-trek critierion identification function.
inducedSubgraph

Get the induced subgraph on a collection of nodes
mixedGraphHasSimpleNumbering

Checks that a MixedGraph has appropriate node numbering
tianComponent

Returns the Tian c-component of a node
observedParents

Get the observed parents on a collection of nodes
getMixedCompForNode

Get the mixed component of a node in a mixed subgraph.
observedNodes

Get all observed nodes in the graph.
ancestors

All ancestors of a collection of nodes
tianSigmaForComponent

Globally identify the covariance matrix of a C-component
tianIdentifier

Identifies components in a tian decomposition
createEdgewiseIdentifier

Create an edgewise identification function
plotLatentDigraph

Plot a latent factor graph
parents

All parents of a collection of nodes.
nodes

Get all nodes in the graph.
tianDecompose

Performs the tian decomposition on the mixed graph
validateNodes

A helper function to validate if input nodes are valid.
validateMatrix

A helper function to validate an input matrix.
print.SEMIDResult

Prints a SEMIDResult object
createHtcIdentifier

Create an htc identification function.
edgewiseID

Determines which edges in a mixed graph are edgewiseID-identifiable
edgewiseIdentifyStep

Perform one iteration of edgewise identification.
toEx

Transforms a vector of node indices in the internal rep. into external numbering
htr

Get all HTR nodes from a set of nodes in a graph.
numNodes

Number of nodes in the graph.
htrFrom

Half trek reachable nodes.
trekSeparationIdentifyStep

Perform one iteration of trek separation identification.
numLatents

Number of latent nodes in the graph.
trFrom

Trek reachable nodes.
graphID.genericID

Determine generic identifiability of a mixed graph.
graphID.decompose

Determine generic identifiability by Tian Decomposition and HTC
graphID.htcID

Determines if a mixed graph is HTC-identifiable.
graphID.globalID

Check for global identifiability of a mixed graph.
semID

Identifiability of linear structural equation models.
updateEdgeCapacities

Update edge capacities.
latentDigraphHasSimpleNumbering

Checks that a LatentDigraph has appropriate node numbering
latentNodes

Get all latent nodes in the graph.
graphID.nonHtcID

Check for generic infinite-to-one via the half-trek criterion.
graphID.main

Helper function to handle a graph component.
validateVarArgsEmpty

A helper function to validate that there are no variable arguments
lfhtcID

Determines which edges in a latent digraph are LF-HTC-identifiable.
siblings

All siblings of a collection of nodes
lfhtcIdentifyStep

Perform one iteration of latent-factor HTC identification.
plotMixedGraph

Plot a mixed graph
subsetsOfSize

Returns all subsets of a certain size
stronglyConnectedComponent

Strongly connected component
updateVertexCapacities

Update vertex capacities.
validateMatrices

A helper function to validate input matrices.
validateLatentNodesAreSources

A helper function to validate that latent nodes in a LatentDigraph are sources.
print.GenericIDResult

Prints a GenericIDResult object
FlowGraph

Construct FlowGraph object
L

Get directed adjacency matrix.