sim.LTA: Simulate Data for Latent Transition Analysis (LTA)

Description

Simulates longitudinal latent class/profile data where initial class membership and transition probabilities may be influenced by time-varying covariates. Supports both Latent Class Analysis (LCA) for categorical outcomes and Latent Profile Analysis (LPA) for continuous outcomes. Measurement invariance is assumed by default (identical item parameters across time).

Usage

sim.LTA(
  N = 500,
  I = 5,
  L = 3,
  distribution = "random",
  times = 2,
  type = "LCA",
  rate = NULL,
  constraint = "VV",
  mean.range = c(-2, 2),
  covs.range = c(0.01, 4),
  poly.value = 5,
  IQ = "random",
  params = NULL,
  is.sort = TRUE,
  covariates = NULL,
  beta = NULL,
  gamma = NULL
)

Value

A list of class "sim.LTA" containing:

responses: List of length times; observed data matrices ($N \times I$).
Zs: List of length times; true latent class memberships ($N \times 1$ vectors).
P.Zs: List of length times; marginal class probabilities at each time.
par: Item parameters for LCA (if type="LCA").
means: Class means for LPA (if type="LPA").
covs: Class covariance matrices for LPA (if type="LPA").
rate: True transition matrices (non-covariate mode only; NULL when times=1).
covariates: List of covariate matrices used (covariate mode only).
beta: True initial state coefficients (covariate mode only).
gamma: True transition coefficients (covariate mode only; NULL when times=1).
call: Function call.
arguments: Input arguments.

Arguments

N: Integer; sample size.
I: Integer; number of observed items/indicators per time point.
L: Integer; number of latent classes/profiles.
distribution: Character; distribution of initial class probabilities when not using covariates or params. Options: "uniform" (equal probabilities) or "random" (Dirichlet-distributed, default).
times: Integer; number of time points (must be $\geq 1$).
type: Character; type of latent model. "LCA" for categorical indicators (default), "LPA" for continuous indicators.
rate: List of matrices or NULL; transition probability matrices for non-covariate mode. Each matrix is $L \times L$ with rows summing to 1. If NULL (default), matrices are generated with 0.7 diagonal probability and uniform off-diagonals. Ignored when times=1.
constraint: Character; covariance structure for LPA (type="LPA" only). Options: "VV" (unstructured, default), "VE" (diagonal variance), "EE" (equal variance).
mean.range: Numeric vector; range for randomly generated class means in LPA (default: c(-2, 2)).
covs.range: Numeric vector; range for covariance matrix diagonals in LPA (default: c(0.01, 4)).
poly.value: Integer; number of categories for polytomous LCA items (default: 5).
IQ: Character; method for generating item discrimination in LCA. "random" (default) or fixed values.
params: List or NULL; pre-specified parameters for reproducibility (see Details).
is.sort: A logical value. If TRUE (Default), the latent classes will be ordered in descending order according to P.Z. All other parameters will be adjusted accordingly based on the reordered latent classes.
covariates: List of matrices or NULL; covariate matrices for each time point. Each matrix must have dimensions $N \times p_t$ and include an intercept column (first column must be all 1s). If NULL, covariate mode is disabled. See Details for automatic coefficient generation.
beta: Matrix or NULL; initial state regression coefficients of dimension $p_1 \times L$. Columns correspond to classes 1 to $L$ (last class $L$ is reference and must be zero). If NULL and covariates are used, coefficients are randomly generated from $\text{Uniform}(-1, 1)$.
gamma: List or NULL; transition regression coefficients. Must be a list of length times-1. Each element $t$ is a list of length $L$ (previous state). Each sub-list contains $L$ vectors (next state), where the last vector (reference class) is always $\mathbf{0}$. Ignored when times=1. If NULL and covariates are used with times>=2, coefficients are randomly generated from $\text{Uniform}(-1, 1)$ for non-reference classes.

Model Specification

Initial Class Probabilities (with covariates):: For observation/participant $n$ at time 1, the probability of belonging to latent class $l$ is: $$P(Z_{n1} = l \mid \mathbf{X}_{n1}) = \frac{\exp(\boldsymbol{\beta}_l^\top \mathbf{X}_{n1})} {\sum_{k=1}^L \exp(\boldsymbol{\beta}_k^\top \mathbf{X}_{n1})}$$ where $\mathbf{X}_{n1} = (X_{n10}, X_{n11}, \dots, X_{n1M})^\top$ is the covariate vector for observation/participant $n$ at time 1, with $X_{n10} = 1$ (intercept term) and $X_{n1m}$ ($m=1,\dots,M$) representing the value of the $m$-th covariate. The coefficient vector $\boldsymbol{\beta}_l = (\beta_{l0}, \beta_{l1}, \dots, \beta_{lM})^\top$ corresponds element-wise to $\mathbf{X}_{n1}$, where $\beta_{l0}$ is the intercept and $\beta_{lm}$ ($m \geq 1$) are regression coefficients for covariates. Class $L$ is the reference class ($\boldsymbol{\beta}_L = \mathbf{0}$).

Transition Probabilities (with covariates and times>=2):

For observation/participant $n$ transitioning from class $l$ at time $t-1$ to class $k$ at time $t$ ($t \geq 2$): $$P(Z_{nt} = k \mid Z_{n,t-1} = l, \mathbf{X}_{nt}) = \frac{\exp(\boldsymbol{\gamma}_{lkt}^\top \mathbf{X}_{nt})} {\sum_{j=1}^L \exp(\boldsymbol{\gamma}_{ljt}^\top \mathbf{X}_{nt})}$$ where $\mathbf{X}_{nt} = (X_{nt0}, X_{nt1}, \dots, X_{ntM})^\top$ is the covariate vector at time $t$, with $X_{nt0} = 1$ (intercept) and $X_{ntm}$ ($m=1,\dots,M$) as the $m$-th covariate value. The coefficient vector $\boldsymbol{\gamma}_{lkt} = (\gamma_{lkt0}, \gamma_{lkt1}, \dots, \gamma_{lktM})^\top$ corresponds element-wise to $\mathbf{X}_{nt}$, where $\gamma_{lkt0}$ is the intercept and $\gamma_{lktm}$ ($m \geq 1$) are regression coefficients. Class $L$ is the reference class ($\boldsymbol{\gamma}_{lLt} = \mathbf{0}$ for all $l$).

Without Covariates or When times=1:

Initial probabilities follow a multinomial distribution with probabilities $\boldsymbol{\pi} = (\pi_1, \dots, \pi_L)$. When $times \geq 2$, transitions follow a Markov process with fixed probabilities $\tau_{lk}^{(t)} = P(Z_t = k \mid Z_{t-1} = l)$, where $\sum_{k=1}^L \tau_{lk}^{(t)} = 1$ for each $l$ and $t$.

Details

Covariate Requirements:

Covariate matrices must include an intercept (first column = 1). If omitted, the function adds an intercept and issues a warning.
When covariates is provided but beta or gamma is NULL, coefficients are randomly generated from $\text{Uniform}(-1, 1)$ (non-reference classes only).
The reference class ($L$) always has zero coefficients ($\boldsymbol{\beta}_L = \mathbf{0}$, $\boldsymbol{\gamma}_{l,L} = \mathbf{0}$).

Parameter Compatibility:

Use params to fix item parameters (LCA) or class means/covariances (LPA) across simulations.
In non-covariate mode, rate must be a list of $(times-1)$ valid transition matrices (ignored when times=1).
In covariate mode with times>=2, all three (covariates, beta, gamma) must be consistent in dimensions.

Examples

Run this code

####################### Example 1: Single time point (times=1) ######################
library(LCPA)
set.seed(123)
sim_single <- sim.LTA(N = 200, I = 4, L = 3, times = 1, type = "LCA")
print(sim_single)

####################### Example 2: LPA without covariates ######################
set.seed(123)
sim_lta <- sim.LTA(N = 200, I = 3, L = 3, times = 3, type = "LPA", constraint = "VE")
print(sim_lta)

################## Example 3: With custom covariates (times>=2) ######################
set.seed(123)
N <- 200 ## sample size

## Covariates at time point T1
covariates.inter <- rep(1, N) # Intercept term is always 1 for each n
covariates.X1 <- rnorm(N)     # Covariate X1 is a continuous variable
covariates.X2 <- rbinom(N, 1, 0.5) # Covariate X2 is a binary variable
covariates.X1.X2 <- covariates.X1 * covariates.X2 # Interaction between covariates X1 and X2
covariates.T1 <- cbind(inter=covariates.inter, X1=covariates.X1,
                       X2=covariates.X2, X1.X2=covariates.X1.X2) # Combine into covariates at T1

## Covariates at time point T2
covariates.inter <- rep(1, N) # Intercept term is always 1 for each n
covariates.X1 <- rnorm(N)     # Covariate X1 is a continuous variable
covariates.X2 <- rbinom(N, 1, 0.5) # Covariate X2 is a binary variable
covariates.X1.X2 <- covariates.X1 * covariates.X2 # Interaction between covariates X1 and X2
covariates.T2 <- cbind(inter=covariates.inter, X1=covariates.X1,
                       X2=covariates.X2, X1.X2=covariates.X1.X2) # Combine into covariates at T2

covariates <- list(t1=covariates.T1, t2=covariates.T2) # Combine into final covariates list

## Simulate beta coefficients
# 3x3 matrix (last column is zero because the last category is used as reference)
beta <- matrix(c( 0.8, -0.5, 0.0,
                 -0.3, -0.4, 0.0,
                  0.2,  0.8, 0.0,
                 -0.1,  0.2, 0.0), ncol=3, byrow=TRUE)

## Simulate gamma coefficients (only needed when times>=2)
gamma <- list(
  lapply(1:3, function(l) {
    lapply(1:3, function(k) if(k < 3)
           runif(4, -1.0, 1.0) else c(0, 0, 0, 0)) # Last class as reference
  })
)

## Simulate the data
sim_custom <- sim.LTA(
  N=N, I=4, L=3, times=2, type="LPA",
  covariates=covariates,
  beta=beta,
  gamma=gamma
)

summary(sim_custom)

Run the code above in your browser using DataLab