LTA: Latent Transition Analysis (LTA)

Description

Implements the three-step estimation method (Vermunt, 2010; Liang et al., 2023) for Latent Transition Analysis (LTA), treating latent class memberships at each time point as observed variables with measurement error. Classification uncertainty from Step 1 (latent class/profile analysis) is explicitly incorporated into the transition model estimation in Step 3, ensuring asymptotically unbiased estimates of transition probabilities and covariate effects. This avoids the bias introduced by "hard" modal-class assignment.

Usage

LTA(
  responses,
  L = 2,
  ref.class = L,
  type = "LCA",
  covariates = NULL,
  CEP.timeCross = FALSE,
  CEP.error = TRUE,
  covariates.timeCross = FALSE,
  par.ini = "random",
  params = NULL,
  is.sort = TRUE,
  constraint = "VV",
  method = "EM",
  tol = 1e-04,
  method.SE = "Bootstrap",
  n.Bootstrap = 100,
  maxiter = 5000,
  nrep = 20,
  starts = 100,
  maxiter.wa = 20,
  vis = TRUE,
  control.EM = NULL,
  control.Mplus = NULL,
  control.NNE = NULL
)

Value

An object of class LTA, a named list containing:

beta: Matrix of size $p_1 \times L$. Coefficients for initial class membership multinomial logit model. Columns 1 to $L-1$ are free parameters; column $L$ (reference class) is constrained to $\boldsymbol{\beta}_L = \mathbf{0}$.
gamma: List of length $T-1$. Each element gamma[[t]] (for transition from time $t$ to $t+1$) is a nested list: gamma[[t]][[from_class]][[to_class]] returns coefficient vector of length $p_{t+1}$. The last class ($L$) is reference → coefficients fixed to $\boldsymbol{\gamma}_{kl,t+1} = \mathbf{0}$ for all $k$.
beta.se: Standard errors for beta (if Hessian is invertible). Same dimensions as beta. May contain NA if variance-covariance matrix is not positive definite.
gamma.se: Standard errors for gamma, same nested structure. May contain NAs.
beta.Z.sta: Z-statistics for testing null hypothesis that each beta coefficient equals zero. Computed as beta / beta.se. Same structure as beta.
gamma.Z.sta: Z-statistics for gamma coefficients. Same nested structure as gamma. Used for testing significance of transition effects.
beta.p.value.tail1: One-tailed p-values based on standard normal distribution: $P(Z < -|z|)$. Useful for directional hypotheses. Same structure as beta.
gamma.p.value.tail1: One-tailed p-values for gamma coefficients. Same nested structure as gamma.
beta.p.value.tail2: Two-tailed p-values: $2 \times P(Z < -|z|)$. Standard test for non-zero effect. Same structure as beta.
gamma.p.value.tail2: Two-tailed p-values for gamma coefficients. Same nested structure as gamma.
P.Z.Xns: List of length $T$. Each element is an $N \times L$ matrix of posterior class probabilities $P(Z_{nt}=l \mid \mathbf{X}_{nt})$ for each individual $n$ at time $t$.
P.Zs: List of length $T$. Each element is a vector of length $L$ containing prior class proportions $P(Z_t = l)$ estimated at Step 1 for time $t$.
Zs: List of length $T$. Each element is a vector of length $N$ containing modal class assignments (MAP classifications) $\hat{z}_{nt}$ for each individual at time $t$.
npar: Number of free parameters in the model (depends on covariates).
Log.Lik: Observed-data log-likelihood value at convergence.
Log.Lik.history: Vector tracking log-likelihood at each iteration.
AIC: Akaike Information Criterion value.
BIC: Bayesian Information Criterion value.
iterations: Integer. Number of optimization iterations in Step 3.
coveraged: Logical. TRUE if optimization terminated before reaching maxiter (suggesting convergence). Note: This is a heuristic indicator; formal convergence diagnostics should check Hessian properties.
params: List. Step 1 model parameters (output from LCA() or LPA()).
call: The matched function call.
arguments: List of all input arguments passed to the function (useful for reproducibility).

Arguments

responses

A list of response matrices or data frames. Each matrix corresponds to one time point. Rows of each matrix represent individuals/participants/observations ($N$), columns of each matrix represent observed items/variables ($I$). For type = "LCA": items must be binary or categorical (coded as integers starting from 0). For type = "LPA": items must be continuous (numeric), and each response matrix must be standardized using scale or normalize prior to input.

L

Integer scalar. Number of latent classes/profiles at each time point. Must satisfy $L \geq 2$.

ref.class

Integer $L \geq ref.class \geq 1$. Specifies which latent class to use as the reference category. Default is L (last class). Coefficients for the reference class are fixed to zero. When is.sort=TRUE, classes are first ordered by decreasing P.Z (class 1 has highest probability), then ref.class refers to the position in this sorted order.

type

Character string. Specifies the type of latent variable model for Step 1:

"LCA" — Latent Class Analysis for categorical items.
"LPA" — Latent Profile Analysis for continuous items.

See LCA and LPA for details.

covariates

Optional. A list of matrices/data frames (length = number of time points). Each matrix contains covariates for modeling transitions or initial status. Must include an intercept column (all 1s) as the first column. If NULL (default), only intercept terms are used (i.e., no covariates). For time $t$, dimension is $N \times p_t$. Covariates can vary across time.

CEP.timeCross

Logical. If TRUE, assumes measurement invariance and uses the same Classification Error Probability (get.CEP) matrix across all time points. Requires that item parameters are invariant over time (not checked internally). Default is FALSE.

CEP.error

Logical. If TRUE (recommended), incorporates classification uncertainty via estimated CEP matrices from Step 1. If FALSE, uses identity CEP matrices (equivalent to naive modal assignment; introduces bias and not recommended).

covariates.timeCross

Logical. If TRUE, forces the use of identical $\gamma$ parameters across all time points (i.e., a time-invariant probability transition matrix). In this case, users should ensure that the covariate matrices at different time points have the same dimensions (values may differ) to match the fixed form of the $\gamma_{lkt}$ coefficients. Default is FALSE, allowing for potentially different probability transition matrices across time points.

par.ini

Specification for parameter initialization. Options include:

"random": Completely random initialization (default).
"kmeans": Initializes parameters via K-means clustering on observed data (McLachlan & Peel, 2004).
A list for LCA containing:

par

An $L \times I \times K_{\max}$ array of initial conditional probabilities for each latent class, item, and response category (where $K_{\max}$ is the maximum number of categories across items).

P.Z

A numeric vector of length $L$ specifying initial prior probabilities for latent classes.

A list for LPA containing:

means: An $L \times I$ matrix of initial mean vectors for each profile.
covs: An $I \times I \times L$ array of initial covariance matrices for each profile.
P.Z: A numeric vector of length $L$ specifying initial prior probabilities for profiles.

params

Optional list of pre-estimated Step 1 parameters. If NULL (default), Step 1 models are estimated internally. If provided, no LCA or LPA parameter estimation will be performed; instead, the parameters provided in params will be used as fixed values. Additionally, params must contain:

A list for LCA containing:

par

An $L \times I \times K_{\max}$ array of initial conditional probabilities for each latent class, item, and response category (where $K_{\max}$ is the maximum number of categories across items).

P.Z

A numeric vector of length $L$ specifying initial prior probabilities for latent classes.

A list for LPA containing:

means: An $L \times I$ matrix of initial mean vectors for each profile.

covs

An $I \times I \times L$ array of initial covariance matrices for each profile.

P.Z

A numeric vector of length $L$ specifying initial prior probabilities for profiles.

is.sort

A logical value. If TRUE (Default), the latent classes will be ordered in descending order according to P.Z. All other parameters will be adjusted accordingly based on the reordered latent classes.

constraint

Character (LPA only). Specifies structure of within-class covariance matrices:

"VV" — Class-varying variances and covariances (unconstrained; default).
"EE" — Equal variances and covariances across all classes (homoscedastic).

method

Character. Estimation algorithm for Step 1 models:

"EM" — Expectation-Maximization (default; robust and widely used).
"Mplus" — Interfaces with Mplus software (requires external installation).
"NNE" — Neural Network Estimator (experimental; uses transformer + simulated annealing, more reliable than both "EM" and "Mplus").

tol

Convergence tolerance for log-likelihood difference (default: 1e-4).

method.SE

Character. Method for estimating standard errors of parameter estimates:

"Obs" — Approximates the observed information matrix via numerical differentiation (Richardson's method). Standard errors are obtained from the inverse Hessian. May fail or be unreliable in small samples or with complex likelihood surfaces.
"Bootstrap" — Uses nonparametric bootstrap resampling to estimate empirical sampling variability. More robust to model misspecification and small-sample bias. Computationally intensive but recommended when asymptotic assumptions are questionable.

Default is "Bootstrap".

n.Bootstrap

Integer. Number of bootstrap replicates used when method.SE = "Bootstrap". Default is 100. McLachlan & Peel (2004) suggest that 50–100 replicates often provide adequate accuracy for practical purposes, though more (e.g., 500–1000) may be preferred for publication-quality inference. Each replicate involves re-estimating the full three-step LTA model on a resampled dataset.

maxiter

Maximum number of iterations for optimizing the regression coefficients. Default: 5000.

nrep

Integer controlling replication behavior:

If par.ini = "random", number of random initializations.
If par.ini = "kmeans", number of K-means runs for initialization.
For method="Mplus", controls number of random starts in Mplus via STARTS option.
Best solution is selected by log-likelihood/BIC across replications.
Ignored for user-provided initial parameters.

starts

Number of random initializations to explore during warm-up phase (default: 100).

maxiter.wa

Maximum number of training iterations allowed per warm-up run. After completion, the top nrep solutions (by log-likelihood) are promoted to final training phase (default: 20).

vis

Logical. If TRUE, displays progress information during estimation (default: TRUE).

control.EM

List of control parameters for EM algorithm:

maxiter: Maximum iterations (default: 2000).

tol

Convergence tolerance for log-likelihood difference (default: 1e-4).

control.Mplus

List of control parameters for Mplus estimation:

maxiter: Maximum iterations for Mplus optimization (default: 2000).

tol

Convergence tolerance for log-likelihood difference (default: 1e-4).

files.path

Character string specifying the directory path where Mplus will write its intermediate files (e.g., .inp model input, .dat data file, .out output, and saved posterior probabilities). This argument is required — if NULL (default), the function throws an error. The specified directory must exist and be writable; if it does not exist, the function attempts to create it recursively. A unique timestamped subdirectory (e.g., "Mplus_LPA_YYYY-MM-DD_HH-MM-SS" or "Mplus_LCA_YYYY-MM-DD_HH-MM-SS") will be created within this path to store all run-specific files and avoid naming conflicts. See in LCA and LPA.

files.clean

Logical. If TRUE (default), all intermediate files and the temporary working directory created for this run are deleted upon successful completion or error exit (via on.exit()). If FALSE, all generated files are retained in files.path (or the auto-generated temp dir) for inspection or debugging. Note: when files.path = NULL, even if files.clean = FALSE, the temporary directory may still be cleaned up by the system later — for guaranteed persistence, specify a custom files.path.

control.NNE

List of control parameters for NNE algorithm:

hidden.layers: Integer vector specifying layer sizes in fully-connected network (default: c(12,12)).

activation.function

Activation function (e.g., "tanh", default: "tanh").

d.model

Dimensionality of transformer encoder embeddings (default: 8).

nhead

Number of attention heads in transformer (default: 2).

dim.feedforward

Dimensionality of transformer feedforward network (default: 16).

eps

Small constant for numerical stability (default: 1e-8).

lambda

A factor for slight regularization of all parameters (default: 1e-5).

initial.temperature

Initial temperature for simulated annealing (default: 1000).

cooling.rate

Cooling rate per iteration in simulated annealing (default: 0.5).

maxiter.sa

Maximum iterations for simulated annealing (default: 1000).

threshold.sa

Minimum temperature threshold for annealing (default: 1e-10).

maxiter

Maximum training epochs (default: 1000).

maxiter.early

Patience parameter for early stopping (default: 50).

maxcycle

Maximum cycles for optimization (default: 10).

lr

Learning rate, controlling the step size of neural network parameter updates (default: 0.025).

scheduler.patience

Patience for learning rate decay (if the loss function does not improve for more than patience consecutive epochs, the learning rate will be reduced) (default: 10).

scheduler.factor

Learning rate decay factor; the new learning rate equals the original learning rate multiplied by scheduler.factor (default: 0.70).

plot.interval

Interval (in epochs) for plotting training diagnostics (default: 100).

device

Specifies the hardware device; can be "CPU" (default) or "GPU". If the GPU is not available, it automatically falls back to CPU.

Methodology Overview

The three-step LTA proceeds as follows:

Step 1 — Unconditional Latent Class/Profile Model: At each time point $t$, fit an unconditional LCA or LPA model (ignoring transitions and covariates). Obtain posterior class membership probabilities $P(Z_{nt}=l \mid \mathbf{X}_{nt})$ for each individual $n$ and class $l$ using Bayes' theorem.

Step 2 — Classification Error Probabilities (equal to get.CEP): Compute the $L \times L$ CEP matrix for each time point $t$, where element $(k,l)$ estimates: $$ \text{CEP}_t(k,l) = P(\hat{Z}_{nt} = l \mid Z_{nt} = k) $$ using a non-parametric approximation based on posterior weights: $$ \widehat{\text{CEP}}_t(k,l) = \frac{ \sum_{n=1}^N \mathbb{I}(\hat{z}_{nt} = l) \cdot P(Z_{nt}=k \mid \mathbf{X}_{nt}) }{ \sum_{n=1}^N P(Z_{nt}=k \mid \mathbf{X}_{nt}) } $$ where $\hat{z}_{nt}$ is the modal (most likely) class assignment for individual $n$ at time $t$.

Step 3 — Transition Model with Measurement Error Correction: Estimate the multinomial logit models for:

Initial class membership (time 1): $P(Z_{n1} = l \mid \mathbf{X}_{n1}) = \frac{\exp(\boldsymbol{\beta}_l^\top \mathbf{X}_{n1})}{\sum_{k=1}^L \exp(\boldsymbol{\beta}_k^\top \mathbf{X}_{n1})}$
Transitions (time $t > 1$): $P(Z_{nt} = l \mid Z_{n,t-1} = k, \mathbf{X}_{nt}) = \frac{\exp(\boldsymbol{\gamma}_{kl t}^\top \mathbf{X}_{nt})}{\sum_{j=1}^L \exp(\boldsymbol{\gamma}_{kj t}^\top \mathbf{X}_{nt})}$

where $\mathbf{X}_{n1} = (X_{n10}, X_{n11}, \dots, X_{n1M})^\top$ is the covariate vector for observation/participant $n$ at time 1, with $X_{n10} = 1$ (intercept term) and $X_{n1m}$ ($m=1,\dots,M$) representing the value of the $m$-th covariate. The coefficient vector $\boldsymbol{\beta}_l = (\beta_{l0}, \beta_{l1}, \dots, \beta_{lM})^\top$ corresponds element-wise to $\mathbf{X}_{n1}$, where $\beta_{l0}$ is the intercept and $\beta_{lm}$ ($m \geq 1$) are regression coefficients for covariates. Class $L$ is the reference class ($\boldsymbol{\beta}_L = \mathbf{0}$). $\mathbf{X}_{nt} = (X_{nt0}, X_{nt1}, \dots, X_{ntM})^\top$ is the covariate vector at time $t$, with $X_{nt0} = 1$ (intercept) and $X_{ntm}$ ($m=1,\dots,M$) as the $m$-th covariate value. The coefficient vector $\boldsymbol{\gamma}_{lkt} = (\gamma_{lkt0}, \gamma_{lkt1}, \dots, \gamma_{lktM})^\top$ corresponds element-wise to $\mathbf{X}_{nt}$, where $\gamma_{lkt0}$ is the intercept and $\gamma_{lktm}$ ($m \geq 1$) are regression coefficients. Class $L$ is the reference class ($\boldsymbol{\gamma}_{lLt} = \mathbf{0}$ for all $l$).

The full observed-data likelihood integrates over all possible latent class paths $\mathbf{z}_n = (z_{n1},\dots,z_{nT})$: $$ \begin{aligned} \log \mathcal{L}(\boldsymbol{\theta}) &= \sum_{n=1}^N \log \Biggl[ \sum_{\mathbf{z}_n \in \{1,\dots,L\}^T} \Bigl( \prod_{t=1}^T \text{CEP}_t(z_{nt}, \hat{z}_{nt}) \Bigr) \cdot \\ &\quad P(Z_{n1}=z_{n1} \mid \mathbf{X}_{n1}) \cdot \\ &\quad \prod_{t=2}^T P(Z_{nt}=z_{nt} \mid Z_{n,t-1}=z_{n,t-1}, \mathbf{X}_{nt}) \Biggr] \end{aligned} $$ Parameters $\boldsymbol{\theta} = \{\boldsymbol{\beta}, \boldsymbol{\gamma}\}$ are estimated via maximum likelihood using the BOBYQA algorithm (box-constrained derivative-free optimization). Reference class $L$ satisfies $\boldsymbol{\beta}_L = \mathbf{0}$ and $\boldsymbol{\gamma}_{kl t} = \mathbf{0}$ for all $k$ when $l = L$.

Bootstrap Standard Error Estimation

When method.SE = "Bootstrap", standard errors are estimated using a nonparametric bootstrap procedure:

Draw $B$ (=n.Bootstrap) independent samples of size $N$ with replacement from the original data.
For each bootstrap sample $b=1,\dots,B$, re-estimate the full three-step LTA model (Steps 1–3), yielding parameter vector $\hat{\boldsymbol{\theta}}^{(b)}$.
Compute the bootstrap standard error for each parameter as the sample standard deviation across replicates: $$ \widehat{\mathrm{SE}}_{\mathrm{boot}}(\hat{\theta}_j) = \sqrt{ \frac{1}{B-1} \sum_{b=1}^B \left( \hat{\theta}_j^{(b)} - \bar{\theta}_j \right)^2 }, $$ where $\bar{\theta}_j = \frac{1}{B}\sum_{b=1}^B \hat{\theta}_j^{(b)}$.

This approach does not rely on large-sample normality or correct specification of the information matrix, making it particularly suitable for complex models like LTA where analytic derivatives are difficult or unstable. However, it increases computational cost linearly with $B$.

Important Implementation Details

Reference Class: The last latent class ($L$) is always treated as the reference category. All corresponding coefficients in beta and gamma are fixed to zero ($\boldsymbol{\beta}_L = \mathbf{0}$, $\boldsymbol{\gamma}_{kl t} = \mathbf{0}$ for $l=L$).
CEP Matrices: When CEP.error = TRUE, misclassification probabilities are estimated non-parametrically using Step 1 posterior probabilities. This corrects for classification uncertainty. Setting CEP.timeCross = TRUE assumes these error structures are identical across time (measurement invariance). See in get.CEP.
Covariate Handling: Covariates for initial status (time 1) and transitions (time $t \geq 2$) can differ. For transitions to time $t$, the covariate matrix must have dimensions $N \times (M_{t}+1)$, i.e., an intercept column of all $1$ plus $M_{t}$ columns of covariates in time $t$.
Optimization: Step 3 uses L-BFGS-B via nloptr to ensure numerical stability. Standard errors are derived from the inverse Hessian (via hessian). If singular, Moore-Penrose pseudoinverse (ginv) is used, and negative variances are set to NA.
Computational Complexity: Likelihood evaluation requires enumerating $L^T$ possible latent paths.
Bootstrap Computation: Each bootstrap iteration re-fits Steps 1–3 independently, including re-estimation of P.Z.Xns, CEP matrices and transition parameters. To ensure reproducibility, set a seed before calling LTA() when using method.SE = "Bootstrap". Progress messages during bootstrapping include current replicate index and optimization diagnostics. Users should monitor convergence in each bootstrap run; failed runs will result in NA entries in SEs and derived statistics.

References

Liang, Q., la Torre, J. d., & Law, N. (2023). Latent Transition Cognitive Diagnosis Model With Covariates: A Three-Step Approach. Journal of Educational and Behavioral Statistics, 48(6), 690-718. https://doi.org/10.3102/10769986231163320

McLachlan, G. J., & Peel, D. (2004). Finite Mixture Models. Wiley. https://books.google.com.sg/books?id=c2_fAox0DQoC

Vermunt, J. K. (2010). Latent class modeling with covariates: Two improved three-step approaches. Political Analysis, 18(4), 450–469. https://doi.org/10.1093/pan/mpq025

Examples

Run this code


# \donttest{

library(LCPA)

set.seed(123)
N <- 2000 ## sample size
L <- 3   ## number of latent class
I <- 6   ## number of variables/items

## Covariates at time point T1
covariates.inter <- rep(1, N)                 # Intercept term is always 1 for each individual
covariates.X11 <- rnorm(N)                    # Covariate X1 is a continuous variable
# Combine into covariates at T1
covariates.T1 <- cbind(Intercept=covariates.inter, X1=covariates.X11)
## Covariates at time point T2
covariates.inter <- rep(1, N)                 # Intercept term is always 1 for each individual
covariates.X21 <- rnorm(N)                    # Covariate X1 is a continuous variable
# Combine into covariates at T1
covariates.T2 <- cbind(Intercept=covariates.inter, X1=covariates.X21)

# Combine into final covariates list
covariates <- list(t1=covariates.T1, t2=covariates.T2)

## Simulate beta coefficients
# last column is zero because the last category is used as reference
## fix reference class to class 2
beta <- matrix(c( 0.8, 0.0,  0.1,
                 -0.3, 0.0, -0.5), ncol=L, byrow=TRUE)

## Simulate gamma coefficients
gamma <- list(
  lapply(1:L, function(l) {
    lapply(1:L, function(k) if(k < L)
           runif(2, -2.0, 2.0) else c(0, 0)) # Last class as reference
  })
)

## Simulate the data
sim_custom <- sim.LTA(
  N=N, I=I, L=L, times=2, type="LCA", IQ=0.9,
  covariates=covariates,
  beta=beta,
  gamma=gamma
)
summary(sim_custom)
responses <- sim_custom$responses
covariates <- sim_custom$covariates

## fix reference class to class 2
LTA.obj <- LTA(responses, L=L, ref.class=2, type="LCA",
               covariates=covariates,
               method.SE="Bootstrap", n.Bootstrap=10,
               CEP.timeCross=FALSE, CEP.error=TRUE, covariates.timeCross=FALSE,
               par.ini = "random", method="EM", vis = TRUE)

print(LTA.obj)
# }

Run the code above in your browser using DataLab