make_dsem_ram: Make a RAM (Reticular Action Model)

Description

make_dsem_ram converts SEM arrow notation to ram describing SEM parameters

Usage

make_dsem_ram(
  sem,
  times,
  variables,
  covs = variables,
  quiet = FALSE,
  remove_na = TRUE
)

Value

A reticular action module (RAM) describing dependencies

Arguments

sem: Specification for time-series structural equation model structure including lagged or simultaneous effects. See Details section in make_dsem_ram for more description
times: A character vector listing the set of times in order
variables: A character vector listing the set of variables
covs: A character vector listing variables for which to estimate a standard deviation
quiet: Boolean indicating whether to print messages to terminal
remove_na: Boolean indicating whether to remove NA values from RAM (default) or not. remove_NA=FALSE might be useful for exploration and diagnostics for advanced users

Details

RAM specification using arrow-and-lag notation

Each line of the RAM specification for make_dsem_ram consists of four (unquoted) entries, separated by commas:

1. Arrow specification:: This is a simple formula, of the form A -> B or, equivalently, B <- A for a regression coefficient (i.e., a single-headed or directional arrow); A <-> A for a variance or A <-> B for a covariance (i.e., a double-headed or bidirectional arrow). Here, A and B are variable names in the model. If a name does not correspond to an observed variable, then it is assumed to be a latent variable. Spaces can appear freely in an arrow specification, and there can be any number of hyphens in the arrows, including zero: Thus, e.g., A->B, A --> B, and A>B are all legitimate and equivalent.
2. Lag (using positive values):: An integer specifying whether the linkage is simultaneous (lag=0) or lagged (e.g., X -> Y, 1, XtoY indicates that X in time T affects Y in time T+1), where only one-headed arrows can be lagged. Using positive values to indicate lags then matches the notational convention used in package dynlm.
3. Parameter name:: The name of the regression coefficient, variance, or covariance specified by the arrow. Assigning the same name to two or more arrows results in an equality constraint. Specifying the parameter name as NA produces a fixed parameter.
4. Value:: start value for a free parameter or value of a fixed parameter. If given as NA (or simply omitted), the model is provide a default starting value.

Lines may end in a comment following #. The function extends code copied from package `sem` under licence GPL (>= 2) with permission from John Fox.

Simultaneous autoregressive process for simultaneous and lagged effects

This text then specifies linkages in a multivariate time-series model for variables $\mathbf X$ with dimensions $T \times C$ for $T$ times and $C$ variables. make_dsem_ram then parses this text to build a path matrix $\mathbf{P}$ with dimensions $TC \times TC$, where element $\rho_{k_2,k_1}$ represents the impact of $x_{t_1,c_1}$ on $x_{t_2,c_2}$, where $k_1=T c_1+t_1$ and $k_2=T c_2+t_2$. This path matrix defines a simultaneous equation

$$ \mathrm{vec}(\mathbf X) = \mathbf P \mathrm{vec}(\mathbf X) + \mathrm{vec}(\mathbf \Delta)$$

where $\mathbf \Delta$ is a matrix of exogenous errors with covariance $\mathbf{V = \Gamma \Gamma}^t$, where $\mathbf \Gamma$ is the Cholesky of exogenous covariance. This simultaneous autoregressive (SAR) process then results in $\mathbf X$ having covariance:

$$ \mathrm{Cov}(\mathbf X) = \mathbf{(I - P)}^{-1} \mathbf{\Gamma \Gamma}^t \mathbf{((I - P)}^{-1})^t $$

Usefully, computing the inverse-covariance (precision) matrix $\mathbf{Q = V}^{-1}$ does not require inverting $\mathbf{(I - P)}$:

$$ \mathbf{Q} = (\mathbf{\Gamma}^{-1} \mathbf{(I - P)})^t \mathbf{\Gamma}^{-1} \mathbf{(I - P)} $$

Example: univariate first-order autoregressive model

This simultaneous autoregressive (SAR) process across variables and times allows the user to specify both simutanous effects (effects among variables within year $T$) and lagged effects (effects among variables among years $T$). As one example, consider a univariate and first-order autoregressive process where $T=4$. with independent errors. This is specified by passing sem = "X -> X, 1, rho \n X <-> X, 0, sigma" to make_dsem_ram. This is then parsed to a RAM:

heads	to	from	paarameter	start
1	2	1	1	<NA>
1	3	2	1	<NA>
1	4	3	1	<NA>
2	1	1	2	<NA>
2	2	2	2	<NA>
2	3	3	2	<NA>
2	4	4	2	<NA>

Rows of this RAM where heads=1 are then interpreted to construct the path matrix $\mathbf P$, where column "from" in the RAM indicates column number in the matrix, column "to" in the RAM indicates row number in the matrix:

$$ \mathbf P = \begin{bmatrix} 0 & 0 & 0 & 0 \\ \rho & 0 & 0 & 0 \\ 0 & \rho & 0 & 0 \\ 0 & 0 & \rho & 0\\ \end{bmatrix} $$

While rows where heads=2 are interpreted to construct the Cholesky of exogenous covariance $\mathbf \Gamma$ and column "parameter" in the RAM associates each nonzero element of those two matrices with an element of a vector of estimated parameters:

$$ \mathbf \Gamma = \begin{bmatrix} \sigma & 0 & 0 & 0 \\ 0 & \sigma & 0 & 0 \\ 0 & 0 & \sigma & 0 \\ 0 & 0 & 0 & \sigma\\ \end{bmatrix} $$

with two estimated parameters $\mathbf \beta = (\rho, \sigma) $. This then results in covariance:

$$ \mathrm{Cov}(\mathbf X) = \sigma^2 \begin{bmatrix} 1 & \rho^1 & \rho^2 & \rho^3 \\ \rho^1 & 1 + \rho^2 & \rho^1 (1 + \rho^2) & \rho^2 (1 + \rho^2) \\ \rho^2 & \rho^1 (1 + \rho^2) & 1 + \rho^2 + \rho^4 & \rho^1 (1 + \rho^2 + \rho^4) \\ \rho^3 & \rho^2 (1 + \rho^2) & \rho^1 (1 + \rho^2 + \rho^4) & 1 + \rho^2 + \rho^4 + \rho^6 \\ \end{bmatrix} $$

Which converges on the stationary covariance for an AR1 process for times $t>>1$:

$$ \mathrm{Cov}(\mathbf X) = \frac{\sigma^2}{1+\rho^2} \begin{bmatrix} 1 & \rho^1 & \rho^2 & \rho^3 \\ \rho^1 & 1 & \rho^1 & \rho^2 \\ \rho^2 & \rho^1 & 1 & \rho^1 \\ \rho^3 & \rho^2 & \rho^1 & 1\\ \end{bmatrix} $$

except having a lower pointwise variance for the initial times, which arises as a "boundary effect".

Similarly, the arrow-and-lag notation can be used to specify a SAR representing a conventional structural equation model (SEM), cross-lagged (a.k.a. vector autoregressive) models (VAR), dynamic factor analysis (DFA), or many other time-series models.

Examples

Run this code

# Univariate AR1
sem = "
  X -> X, 1, rho
  X <-> X, 0, sigma
"
make_dsem_ram( sem=sem, variables="X", times=1:4 )

# Univariate AR2
sem = "
  X -> X, 1, rho1
  X -> X, 2, rho2
  X <-> X, 0, sigma
"
make_dsem_ram( sem=sem, variables="X", times=1:4 )

# Bivariate VAR
sem = "
  X -> X, 1, XtoX
  X -> Y, 1, XtoY
  Y -> X, 1, YtoX
  Y -> Y, 1, YtoY
  X <-> X, 0, sdX
  Y <-> Y, 0, sdY
"
make_dsem_ram( sem=sem, variables=c("X","Y"), times=1:4 )

# Dynamic factor analysis with one factor and two manifest variables
# (specifies a random-walk for the factor, and miniscule residual SD)
sem = "
  factor -> X, 0, loadings1
  factor -> Y, 0, loadings2
  factor -> factor, 1, NA, 1
  X <-> X, 0, NA, 0.01       # Fix at negligible value
  Y <-> Y, 0, NA, 0.01       # Fix at negligible value
"
make_dsem_ram( sem=sem, variables=c("X","Y","factor"), times=1:4 )

# ARIMA(1,1,0)
sem = "
  factor -> factor, 1, rho1 # AR1 component
  X -> X, 1, NA, 1          # Integrated component
  factor -> X, 0, NA, 1
  X <-> X, 0, NA, 0.01      # Fix at negligible value
"
make_dsem_ram( sem=sem, variables=c("X","factor"), times=1:4 )

# ARIMA(0,0,1)
sem = "
  factor -> X, 0, NA, 1
  factor -> X, 1, rho1     # MA1 component
  X <-> X, 0, NA, 0.01     # Fix at negligible value
"
make_dsem_ram( sem=sem, variables=c("X","factor"), times=1:4 )

Run the code above in your browser using DataLab