Learn R Programming

SEMID Package

Purpose

This package offers a number of functions for determining parameter identifiability in different classes of linear structural equation models (SEMs) with latent variables. Each model is defined by a directed graph or by a mixed graph, depending on the modeling assumptions. The following sections highlight the primary ways in which the package can be used.

Linear SEMs given by Mixed Graphs

In the SEMID package, we represent mixed graphs via the MixedGraph class.

> # Mixed graphs are specified by their directed adjacency matrix L and
> # bidirected adjacency matrix O.
> library(SEMID)
> L = t(matrix(
+ c(0, 1, 0, 0, 0,
+   0, 0, 0, 1, 1,
+   0, 0, 0, 1, 0,
+   0, 1, 0, 0, 1,
+   0, 0, 0, 1, 0), 5, 5))
>
> O = t(matrix(
+ c(0, 0, 0, 0, 0,
+   0, 0, 1, 0, 1,
+   0, 0, 0, 1, 0,
+   0, 0, 0, 0, 0,
+   0, 0, 0, 0, 0), 5, 5)); O=O+t(O)
>
> # Create the mixed graph object corresponding to L and O
> g = MixedGraph(L, O)
>
> # Plot the mixed graph
> g$plot()

See the documentation for the MixedGraph class ?MixedGraph for more information.

Global Identifiability

For deciding global identifiability in mixed graphs, there exists an ‘if and only if’ graphical criterion developed by

Drton, M., Foygel, R., and Sullivant, S. (2011) Global identifiability of linear structural equation models. Ann. Statist. 39(2): 865-886. https://doi.org/10.1214/10-AOS859.

This criterion can be accessed through the function globalID.

> # Check global identifiability
> globalID(g)
[1] FALSE

Generic Identifiability

There still do not exist any ‘if and only if’ graphical conditions for testing whether or not a mixed graph is generically identifiable. However, there do exist sufficient and necessary conditions. The SEMID package contains implementations of various sufficient conditions.

  • The half-trek criterion:

Rina Foygel, Jan Draisma, Mathias Drton (2012). Half-trek criterion for generic identifiability of linear structural equation models. Ann. Statist. 40(3):1682--1713. https://doi.org/10.1214/12-AOS1012.

  • Ancestor decomposition techniques:

Mathias Drton, Luca Weihs (2016). Generic Identifiability of Linear Structural Equation Models by Ancestor Decomposition. Scand. J. Statist. 43:1035--1045. https://doi.org/10.1111/sjos.12227.

  • Edgewise and determinantal criteria:

Luca Weih, Bill Robinson, Emilie Dufresne, Jennifer Kenkel, Kaie Kubjas, Reginald McGee II,Nhan Nguyen, Elina Robeva, Mathias Drton (2017). Determinantal Generalizations of Instrumental Variables. J. Causal Inference 6(1). https://doi.org/10.1515/jci-2017-0009.

> # Check generic identifiability using different criteria
> # Start with the half-trek criterion
> htcID(g)
Call: SEMID::htcID(mixedGraph = g)

Mixed Graph Info.
# nodes: 5 
# dir. edges: 7 
# bi. edges: 3 

Generic Identifiability Summary
# dir. edges shown gen. identifiable: 2 
# bi. edges shown gen. identifiable: 0 

Generically identifiable dir. edges:
2->5, 4->5 

Generically identifiable bi. edges:
None

> # Ancestor decomposition techniques:
> ancestralID(g)
Call: ancestralID(mixedGraph = g)

Mixed Graph Info.
# nodes: 5 
# dir. edges: 7 
# bi. edges: 3 

Generic Identifiability Summary
# dir. edges shown gen. identifiable: 2 
# bi. edges shown gen. identifiable: 0 

Generically identifiable dir. edges:
2->5, 4->5 

Generically identifiable bi. edges:
None

> # Edgewise identification algorithm:
> edgewiseID(g)
Call: edgewiseID(mixedGraph = g)

Mixed Graph Info.
# nodes: 5 
# dir. edges: 7 
# bi. edges: 3 

Generic Identifiability Summary
# dir. edges shown gen. identifiable: 4 
# bi. edges shown gen. identifiable: 0 

Generically identifiable dir. edges:
2->4, 5->4, 2->5, 4->5 

Generically identifiable bi. edges:
None

> # Edgewise identification algorithm leveraging trek-separation relations:
> edgewiseTSID(g)
Call: edgewiseTSID(mixedGraph = g)

Mixed Graph Info.
# nodes: 5 
# dir. edges: 7 
# bi. edges: 3 

Generic Identifiability Summary
# dir. edges shown gen. identifiable: 4 
# bi. edges shown gen. identifiable: 0 

Generically identifiable dir. edges:
2->4, 5->4, 2->5, 4->5 

Generically identifiable bi. edges:
None

Note that, by default, all strategies first apply a Tian decomposition and then check identifiability on each of the components. This yields faster computations as described in Section 8 of Foygel, Draisma, and Drton (2012). It is also possible to apply different identification strategies repeatedly until no further edges can be identified. This is possible via the function generalGenericID.

> # Check generic identifiability by repeatedly applying different criteria
> generalGenericID(mixedGraph = g, 
+                   idStepFunctions = list(htcIdentifyStep,
+                                          ancestralIdentifyStep, 
+                                          edgewiseIdentifyStep, 
+                                          trekSeparationIdentifyStep), 
+                   tianDecompose = TRUE)
Call: generalGenericID(mixedGraph = g, idStepFunctions = list(htcIdentifyStep, 
    ancestralIdentifyStep, edgewiseIdentifyStep, trekSeparationIdentifyStep), 
    tianDecompose = TRUE)

Mixed Graph Info.
# nodes: 5 
# dir. edges: 7 
# bi. edges: 3 

Generic Identifiability Summary
# dir. edges shown gen. identifiable: 4 
# bi. edges shown gen. identifiable: 0 

Generically identifiable dir. edges:
2->4, 5->4, 2->5, 4->5 

Generically identifiable bi. edges:
None

In this example, we do not get additional edges certified to be generically identifiability. Therefore, we check the necessary condition from Foygel, Draisma, and Drton (2012) for generic identifiability of the whole graph, which is also implemented in SEMID.

> graphID.nonHtcID(g$L(), g$O())
[1] TRUE

This means that the given graph is infinite-to-one and, in particular, not generically identifiable.

Linear SEMs given by Latent-Factor Graphs

The latent-factor half-trek criterion (LF-HTC) by Barber, Drton, Sturma and Weihs (2022) is a sufficient criterion to check generic identifiability in directed graphical models with explicitly modeled latent variables. These models correspond to latent-factor graphs, which we represent via the LatentDigraph class.

> # Latent digraphs are specified by their directed adjacency matrix L
> library(SEMID)
> L = matrix(c(0, 1, 0, 0, 0, 0,
+              0, 0, 1, 0, 0, 0,
+              0, 0, 0, 0, 0, 0,
+              0, 0, 0, 0, 1, 0,
+              0, 0, 0, 0, 0, 0,
+              1, 1, 1, 1, 1, 0), 6, 6, byrow=TRUE)
> observedNodes = seq(1,5)
> latentNodes = c(6)
>
> # Create the latent digraph object corresponding to L
> g = LatentDigraph(L, observedNodes, latentNodes)
>
> # Plot latent digraph
> plot(g)

The function lfhtcID implements the algorithm to check LF-HTC-identifiability as presented in

Rina Foygel Barber, Mathias Drton, Nils Sturma, Luca Weihs (2022). Half-Trek Criterion for Identifiability of Latent Variable Models. Ann. Statist. 50(6):3174--3196. https://doi.org/doi:10.1214/22-AOS2221.

The LF-HTC is applicable to all graphs where the latent nodes are source nodes.

> lfhtcID(g)
Call: lfhtcID(graph = g)

Latent Digraph Info
# observed nodes: 5 
# latent nodes: 1 
# total nr. of edges between observed nodes: 3 

Generic Identifiability Summary
# nr. of edges between observed nodes shown gen. identifiable: 3 
# gen. identifiable edges: 1->2, 2->3, 4->5

Note that the corresponding mixed graph obtained from a latent projection is not identifiable; see Section 4 in Barber et al. (2022).

> # Get a mixed graph via latent projection
> gMixed <- g$getMixedGraph()
> gMixed$plot()

> # Check the original half-trek criterion on the mixed graph
> htcID(gMixed)
Call: htcID(mixedGraph = gMixed)

Mixed Graph Info.
# nodes: 5 
# dir. edges: 3 
# bi. edges: 10 

Generic Identifiability Summary
# dir. edges shown gen. identifiable: 0 
# bi. edges shown gen. identifiable: 0 

Generically identifiable dir. edges:
None

Generically identifiable bi. edges:
None

Estimating Direct Causal Effects

If a graph is generically identifiable, we can use the identification formulas to obtain estimators of the direct causal effects. For an example, see the more detailed description https://st-mardi.quarto.pub/gmci/chapters/notebook_gallery/notebooks/GMCI-notebook-SEMID/notebook.html.

Identifiability in Sparse Factor Analysis

The matching criterion is a sufficient condition for generic identification of the factor loading matrix (up to column sign) in factor analysis. It is developed in the following paper:

Nils Sturma, Miriam Kranzlmüller, Irem Portakal, Mathias Drton (2025). Matching Criterion for Identifiability in Sparse Factor Analysis. arXiv preprint arXiv:2502.02986

We represent sparse factor analysis graphs via the adjacency matrix lambda, where the columns represent latent nodes and the rows represent the observed nodes.

> # The factor analysis graph is specified by the matrix lambda
> library(SEMID)
> lambda = matrix(c(1, 0, 0,
+                   1, 1, 0,
+                   0, 1, 1,
+                   1, 0, 1,
+                   0, 1, 0,
+                   0, 0, 1), 6, 3, byrow=TRUE)
> # The latent nodes are nodes 1, 2, and 3, while the observed nodes are the 
> # nodes 4, 5, 6, 7, 8, and 9.

The function mID implements an algorithm to check M-identifiability:

> mID(lambda)
Call: mID(lambda = lambda)

Factor Analysis Graph Info:
latent nodes:  1 2 3 
observed nodes:  4 5 6 7 8 9 

Generic Sign-Identifiability Summary:
M-identifiable:    TRUE
Tuple list:
  Tuple 1 
    h: 1
    S: 
    v: 4
    W: 5
    U: 7
  Tuple 2 
    h: 2
    S: 1
    v: 5
    W: 6
    U: 8
  Tuple 3 
    h: 3
    S: 1, 2
    v: 6
    W: 7
    U: 9

M-identifiability can only establish identifiability of graphs that satisfy the Zero Upper Triangular Assumption (ZUTA). Via the function ZUTA we can check this assumption.

> ZUTA(lambda)
Call: ZUTA(lambda = lambda)

Factor Analysis Graph Info:
latent nodes:  1 2 3 4 5 
observed nodes:  6 7 8 9 10 11 12 13 14 15 

ZUTA:    TRUE

Sturma et al. (2025) also provide an extended, more powerful sufficient condition. We can check 'extended M-identifiability' as follows.

> # The factor analysis graph is specified by the matrix lambda
> library(SEMID)
> lambda = matrix(c(1, 0, 0, 0, 0,
+                   1, 1, 0, 0, 0,
+                   1, 1, 1, 0, 0,
+                   1, 1, 1, 1, 0,
+                   1, 1, 1, 1, 0,
+                   1, 1, 1, 1, 0,
+                   1, 1, 1, 1, 1,
+                   1, 1, 1, 1, 0,
+                   0, 0, 0, 0, 1,
+                   0, 0, 0, 0, 1), 10, 5, byrow=TRUE)
> # The latent nodes are nodes 1, 2, 3, 4, and 5, while the observed nodes are the 
> # nodes 6, 7, 8, 9, 10, 11, 12, 13, 14, and 15.
         
> extmID(lambda)
Call: extmID(lambda = lambda)

Factor Analysis Graph Info:
latent nodes:  1 2 3 4 5 
observed nodes:  6 7 8 9 10 11 12 13 14 15 

Generic Sign-Identifiability Summary:
extM-identifiable:    TRUE
Tuple list:
  Tuple 1 
    criterion: localBB
    S: 
    new nodes in S: 1, 2, 3, 4
    U: 6, 7, 8, 9, 10, 11, 12, 13
  Tuple 2 
    criterion: matching
    h: 5
    S: 1, 2, 3, 4
    v: 12
    W: 14
    U: 15

Copy Link

Version

Install

install.packages('SEMID')

Monthly Downloads

268

Version

0.5.0

License

GPL (>= 2)

Issues

Pull Requests

Stars

Forks

Maintainer

Nils Sturma

Last Published

January 30th, 2026

Functions in SEMID (0.5.0)

createIdentifierBaseCase

Create an identifier base case
createLFHtcIdentifier

Create a latent-factor half-trek critierion identification function.
ancestralIdentifyStep

Perform one iteration of ancestral identification.
bidirectedComponents

Get bidirected components of a mixed graph
createEdgewiseIdentifier

Create an edgewise identification function
edgewiseTSID

Determines which edges in a mixed graph are edgewiseID+TS identifiable
createHtcIdentifier

Create an htc identification function.
checkLocalBBCriterion

Check Local BB-Criterion
createSimpleBiDirIdentifier

Identify bidirected edges if all directed edges are identified
createLFIdentifierBaseCase

Create an latent identifier base case
extmID

Check Extended M-Identifiability
checkMatchingCriterion

Check Matching Criterion
children

All children of a collection of nodes.
edgewiseIdentifyStep

Perform one iteration of edgewise identification.
createAncestralIdentifier

Create an ancestral identification function.
edgewiseID

Determines which edges in a mixed graph are edgewiseID-identifiable
createTrekSeparationIdentifier

Create an trek separation identification function
getDescendants

Get descendants of nodes in a graph.
getHalfTrekSystem

Determines if a half-trek system exists in the mixed graph.
getAncestors

Get getAncestors of nodes in a graph.
generalGenericID

A general generic identification algorithm template.
findColumnsWithSumOne

A Helper Function for Check ZUTA
graphID.genericID

Determine generic identifiability of a mixed graph.
getSiblings

Get getSiblings of nodes in a graph.
flowBetween

Flow from one set of nodes to another.
getTrekSystem

Determines if a trek system exists in the mixed graph.
createTrGraph

Helper function to create a graph encoding trek reachable relationships.
descendants

Get descendants of a collection of observed nodes
latentDigraphHasSimpleNumbering

Checks that a LatentDigraph has appropriate node numbering
createTrekFlowGraph

Helper function to create a flow graph.
getParents

Get getParents of nodes in a graph.
getMixedGraph

Get the corresponding mixed graph
graphID.main

Helper function to handle a graph component.
graphID.ancestralID

Determine generic identifiability of an acyclic mixed graph using ancestral decomposition.
graphID

Identifiability of linear structural equation models.
getMixedCompForNode

Get the mixed component of a node in a mixed subgraph.
getMaxFlow

Size of largest HT system Y satisfying the HTC for a node v except perhaps having |getParents(v)| < |Y|.
globalID

Determines whether a mixed graph is globally identifiable.
graphID.htcID

Determines if a mixed graph is HTC-identifiable.
plot.LatentDigraph

Plots the latent digraph
parents

All parents of a collection of nodes.
latentNodes

Get all latent nodes in the graph.
observedParents

Get the observed parents on a collection of nodes
observedNodes

Get all observed nodes in the graph.
graphID.nonHtcID

Check for generic infinite-to-one via the half-trek criterion.
graphID.decompose

Determine generic identifiability by Tian Decomposition and HTC
mixedGraphHasSimpleNumbering

Checks that a MixedGraph has appropriate node numbering
lfhtcID

Determines which edges in a latent digraph are LF-HTC-identifiable.
htrFrom

Half trek reachable nodes.
htcIdentifyStep

Perform one iteration of HTC identification.
print.mIDresult

Prints a mIDresult object
print.extmIDresult

Prints a extmIDresult object
htcID

Determines which edges in a mixed graph are HTC-identifiable.
plotMixedGraph

Plot a mixed graph
plotLatentDigraph

Plot a latent factor graph
toEx

Transforms a vector of node indices in the internal rep. into external numbering
lfhtcIdentifyStep

Perform one iteration of latent-factor HTC identification.
mID

Check M-Identifiability.
trFrom

Trek reachable nodes.
siblings

All siblings of a collection of nodes
semID

Identifiability of linear structural equation models.
trekSeparationIdentifyStep

Perform one iteration of trek separation identification.
tianComponent

Returns the Tian c-component of a node
isSibling

Are two nodes siblings?
inducedSubgraph

Get the induced subgraph on a collection of nodes
validateMatrices

A helper function to validate input matrices.
numObserved

Number of observed nodes in the graph.
numNodes

Number of nodes in the graph.
tianDecompose

Performs the tian decomposition on the mixed graph
updateEdgeCapacities

Update edge capacities.
toIn

Transforms a vector of given node indices into their internal numbering
subsetsOfSize

Returns all subsets of a certain size
print.GenericIDResult

Prints a GenericIDResult object
stronglyConnectedComponent

Strongly connected component
print.LfhtcIDResult

Prints a LfhtcIDResult object
validateLatentNodesAreSources

A helper function to validate that latent nodes in a LatentDigraph are sources.
updateVertexCapacities

Update vertex capacities.
validateMatrix

A helper function to validate an input matrix.
htr

Get all HTR nodes from a set of nodes in a graph.
print.ZUTAresult

Prints a ZUTAresult object
nodes

Get all nodes in the graph.
tianSigmaForComponent

Globally identify the covariance matrix of a C-component
validateNodes

A helper function to validate if input nodes are valid.
numLatents

Number of latent nodes in the graph.
print.SEMIDResult

Prints a SEMIDResult object
validateVarArgsEmpty

A helper function to validate that there are no variable arguments
tianIdentifier

Identifies components in a tian decomposition
L

Get directed adjacency matrix.
FlowGraph

Construct FlowGraph object
ancestralID

Determines which edges in a mixed graph are ancestralID-identifiable
MixedGraph

Construct MixedGraph object
ancestors

All ancestors of a collection of nodes
SEMID-package

SEMID package documentation.
ZUTA

Check the Zero Upper Triangular Assumption
LatentDigraphFixedOrder

Construct LatentDigraphFixedOrder object
LatentDigraph

Construct a LatentDigraph object
O

Get adjacency matrix for bidirected part.