Implements the three-step estimation method (Vermunt, 2010; Liang et al., 2023) for Latent Transition Analysis (LTA), treating latent class memberships at each time point as observed variables with measurement error. Classification uncertainty from Step 1 (latent class/profile analysis) is explicitly incorporated into the transition model estimation in Step 3, ensuring asymptotically unbiased estimates of transition probabilities and covariate effects. This avoids the bias introduced by "hard" modal-class assignment.
LTA(
responses,
L = 2,
ref.class = L,
type = "LCA",
covariates = NULL,
CEP.timeCross = FALSE,
CEP.error = TRUE,
covariates.timeCross = FALSE,
par.ini = "random",
params = NULL,
is.sort = TRUE,
constraint = "VV",
method = "EM",
tol = 1e-04,
method.SE = "Bootstrap",
n.Bootstrap = 100,
maxiter = 5000,
nrep = 20,
starts = 100,
maxiter.wa = 20,
vis = TRUE,
control.EM = NULL,
control.Mplus = NULL,
control.NNE = NULL
)An object of class LTA, a named list containing:
betaMatrix of size \(p_1 \times L\). Coefficients for initial class membership multinomial logit model. Columns 1 to \(L-1\) are free parameters; column \(L\) (reference class) is constrained to \(\boldsymbol{\beta}_L = \mathbf{0}\).
gammaList of length \(T-1\). Each element gamma[[t]] (for transition from time \(t\) to \(t+1\))
is a nested list: gamma[[t]][[from_class]][[to_class]] returns coefficient vector of length \(p_{t+1}\).
The last class (\(L\)) is reference → coefficients fixed to \(\boldsymbol{\gamma}_{kl,t+1} = \mathbf{0}\) for all \(k\).
beta.seStandard errors for beta (if Hessian is invertible). Same dimensions as beta.
May contain NA if variance-covariance matrix is not positive definite.
gamma.seStandard errors for gamma, same nested structure. May contain NAs.
beta.Z.staZ-statistics for testing null hypothesis that each beta coefficient equals zero.
Computed as beta / beta.se. Same structure as beta.
gamma.Z.staZ-statistics for gamma coefficients. Same nested structure as gamma.
Used for testing significance of transition effects.
beta.p.value.tail1One-tailed p-values based on standard normal distribution: \(P(Z < -|z|)\).
Useful for directional hypotheses. Same structure as beta.
gamma.p.value.tail1One-tailed p-values for gamma coefficients. Same nested structure as gamma.
beta.p.value.tail2Two-tailed p-values: \(2 \times P(Z < -|z|)\).
Standard test for non-zero effect. Same structure as beta.
gamma.p.value.tail2Two-tailed p-values for gamma coefficients. Same nested structure as gamma.
P.Z.XnsList of length \(T\). Each element is an \(N \times L\) matrix of posterior class probabilities \(P(Z_{nt}=l \mid \mathbf{X}_{nt})\) for each individual \(n\) at time \(t\).
P.ZsList of length \(T\). Each element is a vector of length \(L\) containing prior class proportions \(P(Z_t = l)\) estimated at Step 1 for time \(t\).
ZsList of length \(T\). Each element is a vector of length \(N\) containing modal class assignments (MAP classifications) \(\hat{z}_{nt}\) for each individual at time \(t\).
nparNumber of free parameters in the model (depends on covariates).
Log.LikObserved-data log-likelihood value at convergence.
Log.Lik.historyVector tracking log-likelihood at each iteration.
AICAkaike Information Criterion value.
BICBayesian Information Criterion value.
iterationsInteger. Number of optimization iterations in Step 3.
coveragedLogical. TRUE if optimization terminated before reaching maxiter (suggesting convergence).
Note: This is a heuristic indicator; formal convergence diagnostics should check Hessian properties.
paramsList. Step 1 model parameters (output from LCA() or LPA()).
callThe matched function call.
argumentsList of all input arguments passed to the function (useful for reproducibility).
A list of response matrices or data frames. Each matrix corresponds to one time point.
Rows of each matrix represent individuals/participants/observations (\(N\)), columns of each
matrix represent observed items/variables (\(I\)).
For type = "LCA": items must be binary or categorical (coded as integers starting from 0).
For type = "LPA": items must be continuous (numeric), and each response matrix must be
standardized using scale or normalize prior to input.
Integer scalar. Number of latent classes/profiles at each time point. Must satisfy \(L \geq 2\).
Integer \(L \geq ref.class \geq 1\). Specifies which latent class to use as the reference category.
Default is L (last class). Coefficients for the reference class are fixed to zero.
When is.sort=TRUE, classes are first ordered by decreasing P.Z (class 1 has highest probability),
then ref.class refers to the position in this sorted order.
Character string. Specifies the type of latent variable model for Step 1:
"LCA" — Latent Class Analysis for categorical items.
"LPA" — Latent Profile Analysis for continuous items.
See LCA and LPA for details.
Optional. A list of matrices/data frames (length = number of time points).
Each matrix contains covariates for modeling transitions or initial status.
Must include an intercept column (all 1s) as the first column.
If NULL (default), only intercept terms are used (i.e., no covariates).
For time \(t\), dimension is \(N \times p_t\). Covariates can vary across time.
Logical. If TRUE, assumes measurement invariance and uses the same
Classification Error Probability (get.CEP) matrix across all time points.
Requires that item parameters are invariant over time (not checked internally).
Default is FALSE.
Logical. If TRUE (recommended), incorporates classification uncertainty via
estimated CEP matrices from Step 1. If FALSE, uses identity CEP matrices
(equivalent to naive modal assignment; introduces bias and not recommended).
Logical. If TRUE, forces the use of identical \(\gamma\) parameters
across all time points (i.e., a time-invariant probability transition matrix).
In this case, users should ensure that the covariate matrices at different time points
have the same dimensions (values may differ) to match the fixed form of the \(\gamma_{lkt}\) coefficients.
Default is FALSE, allowing for potentially different probability transition matrices across time points.
Specification for parameter initialization. Options include:
"random": Completely random initialization (default).
"kmeans": Initializes parameters via K-means clustering on observed data (McLachlan & Peel, 2004).
A list for LCA containing:
parAn \(L \times I \times K_{\max}\) array of initial conditional probabilities for each latent class, item, and response category (where \(K_{\max}\) is the maximum number of categories across items).
P.ZA numeric vector of length \(L\) specifying initial prior probabilities for latent classes.
A list for LPA containing:
meansAn \(L \times I\) matrix of initial mean vectors for each profile.
covsAn \(I \times I \times L\) array of initial covariance matrices for each profile.
P.ZA numeric vector of length \(L\) specifying initial prior probabilities for profiles.
Optional list of pre-estimated Step 1 parameters. If NULL (default),
Step 1 models are estimated internally. If provided, no LCA or LPA parameter estimation
will be performed; instead, the parameters provided in params will be used as
fixed values. Additionally, params must contain:
A list for LCA containing:
parAn \(L \times I \times K_{\max}\) array of initial conditional probabilities for each latent class, item, and response category (where \(K_{\max}\) is the maximum number of categories across items).
P.ZA numeric vector of length \(L\) specifying initial prior probabilities for latent classes.
A list for LPA containing:
meansAn \(L \times I\) matrix of initial mean vectors for each profile.
covsAn \(I \times I \times L\) array of initial covariance matrices for each profile.
P.ZA numeric vector of length \(L\) specifying initial prior probabilities for profiles.
A logical value. If TRUE (Default), the latent classes will be ordered in descending
order according to P.Z. All other parameters will be adjusted accordingly
based on the reordered latent classes.
Character (LPA only). Specifies structure of within-class covariance matrices:
"VV" — Class-varying variances and covariances (unconstrained; default).
"EE" — Equal variances and covariances across all classes (homoscedastic).
Character. Estimation algorithm for Step 1 models:
"EM" — Expectation-Maximization (default; robust and widely used).
"Mplus" — Interfaces with Mplus software (requires external installation).
"NNE" — Neural Network Estimator (experimental; uses transformer + simulated
annealing, more reliable than both "EM" and "Mplus").
Convergence tolerance for log-likelihood difference (default: 1e-4).
Character. Method for estimating standard errors of parameter estimates:
"Obs" — Approximates the observed information matrix via numerical differentiation (Richardson's method).
Standard errors are obtained from the inverse Hessian. May fail or be unreliable in small samples or with complex likelihood surfaces.
"Bootstrap" — Uses nonparametric bootstrap resampling to estimate empirical sampling variability.
More robust to model misspecification and small-sample bias. Computationally intensive but recommended when asymptotic assumptions are questionable.
Default is "Bootstrap".
Integer. Number of bootstrap replicates used when method.SE = "Bootstrap".
Default is 100. McLachlan & Peel (2004) suggest that 50–100 replicates often provide adequate accuracy
for practical purposes, though more (e.g., 500–1000) may be preferred for publication-quality inference.
Each replicate involves re-estimating the full three-step LTA model on a resampled dataset.
Maximum number of iterations for optimizing the regression coefficients. Default: 5000.
Integer controlling replication behavior:
If par.ini = "random", number of random initializations.
If par.ini = "kmeans", number of K-means runs for initialization.
For method="Mplus", controls number of random starts in Mplus via STARTS option.
Best solution is selected by log-likelihood/BIC across replications.
Ignored for user-provided initial parameters.
Number of random initializations to explore during warm-up phase (default: 100).
Maximum number of training iterations allowed per warm-up run. After completion,
the top nrep solutions (by log-likelihood) are promoted to final training phase (default: 20).
Logical. If TRUE, displays progress information during estimation (default: TRUE).
List of control parameters for EM algorithm:
maxiterMaximum iterations (default: 2000).
tolConvergence tolerance for log-likelihood difference (default: 1e-4).
List of control parameters for Mplus estimation:
maxiterMaximum iterations for Mplus optimization (default: 2000).
tolConvergence tolerance for log-likelihood difference (default: 1e-4).
files.pathCharacter string specifying the directory path where Mplus will write its intermediate files
(e.g., .inp model input, .dat data file, .out output, and saved posterior probabilities).
This argument is required — if NULL (default), the function throws an error.
The specified directory must exist and be writable; if it does not exist, the function attempts to create it recursively.
A unique timestamped subdirectory (e.g., "Mplus_LPA_YYYY-MM-DD_HH-MM-SS" or
"Mplus_LCA_YYYY-MM-DD_HH-MM-SS") will be created within this path
to store all run-specific files and avoid naming conflicts. See in LCA and LPA.
files.cleanLogical. If TRUE (default), all intermediate files and the temporary working directory
created for this run are deleted upon successful completion or error exit (via on.exit()).
If FALSE, all generated files are retained in files.path (or the auto-generated temp dir)
for inspection or debugging. Note: when files.path = NULL, even if files.clean = FALSE,
the temporary directory may still be cleaned up by the system later — for guaranteed persistence,
specify a custom files.path.
List of control parameters for NNE algorithm:
hidden.layersInteger vector specifying layer sizes in fully-connected network (default: c(12,12)).
activation.functionActivation function (e.g., "tanh", default: "tanh").
d.modelDimensionality of transformer encoder embeddings (default: 8).
nheadNumber of attention heads in transformer (default: 2).
dim.feedforwardDimensionality of transformer feedforward network (default: 16).
epsSmall constant for numerical stability (default: 1e-8).
lambdaA factor for slight regularization of all parameters (default: 1e-5).
initial.temperatureInitial temperature for simulated annealing (default: 1000).
cooling.rateCooling rate per iteration in simulated annealing (default: 0.5).
maxiter.saMaximum iterations for simulated annealing (default: 1000).
threshold.saMinimum temperature threshold for annealing (default: 1e-10).
maxiterMaximum training epochs (default: 1000).
maxiter.earlyPatience parameter for early stopping (default: 50).
maxcycleMaximum cycles for optimization (default: 10).
lrLearning rate, controlling the step size of neural network parameter updates (default: 0.025).
scheduler.patiencePatience for learning rate decay (if the loss function does not improve for more than patience consecutive epochs, the learning rate will be reduced) (default: 10).
scheduler.factorLearning rate decay factor; the new learning rate equals the original learning rate multiplied by scheduler.factor (default: 0.70).
plot.intervalInterval (in epochs) for plotting training diagnostics (default: 100).
deviceSpecifies the hardware device; can be "CPU" (default) or "GPU". If the GPU is not available, it automatically falls back to CPU.
The three-step LTA proceeds as follows:
Step 1 — Unconditional Latent Class/Profile Model: At each time point \(t\), fit an unconditional LCA or LPA model (ignoring transitions and covariates). Obtain posterior class membership probabilities \(P(Z_{nt}=l \mid \mathbf{X}_{nt})\) for each individual \(n\) and class \(l\) using Bayes' theorem.
Step 2 — Classification Error Probabilities (equal to get.CEP):
Compute the \(L \times L\) CEP matrix for each time point \(t\), where element \((k,l)\) estimates:
$$
\text{CEP}_t(k,l) = P(\hat{Z}_{nt} = l \mid Z_{nt} = k)
$$
using a non-parametric approximation based on posterior weights:
$$
\widehat{\text{CEP}}_t(k,l) = \frac{ \sum_{n=1}^N \mathbb{I}(\hat{z}_{nt} = l) \cdot P(Z_{nt}=k \mid \mathbf{X}_{nt}) }{ \sum_{n=1}^N P(Z_{nt}=k \mid \mathbf{X}_{nt}) }
$$
where \(\hat{z}_{nt}\) is the modal (most likely) class assignment for individual \(n\) at time \(t\).
Step 3 — Transition Model with Measurement Error Correction: Estimate the multinomial logit models for:
Initial class membership (time 1): \(P(Z_{n1} = l \mid \mathbf{X}_{n1}) = \frac{\exp(\boldsymbol{\beta}_l^\top \mathbf{X}_{n1})}{\sum_{k=1}^L \exp(\boldsymbol{\beta}_k^\top \mathbf{X}_{n1})}\)
Transitions (time \(t > 1\)): \(P(Z_{nt} = l \mid Z_{n,t-1} = k, \mathbf{X}_{nt}) = \frac{\exp(\boldsymbol{\gamma}_{kl t}^\top \mathbf{X}_{nt})}{\sum_{j=1}^L \exp(\boldsymbol{\gamma}_{kj t}^\top \mathbf{X}_{nt})}\)
where \(\mathbf{X}_{n1} = (X_{n10}, X_{n11}, \dots, X_{n1M})^\top\) is the covariate vector for observation/participant \(n\) at time 1, with \(X_{n10} = 1\) (intercept term) and \(X_{n1m}\) (\(m=1,\dots,M\)) representing the value of the \(m\)-th covariate. The coefficient vector \(\boldsymbol{\beta}_l = (\beta_{l0}, \beta_{l1}, \dots, \beta_{lM})^\top\) corresponds element-wise to \(\mathbf{X}_{n1}\), where \(\beta_{l0}\) is the intercept and \(\beta_{lm}\) (\(m \geq 1\)) are regression coefficients for covariates. Class \(L\) is the reference class (\(\boldsymbol{\beta}_L = \mathbf{0}\)). \(\mathbf{X}_{nt} = (X_{nt0}, X_{nt1}, \dots, X_{ntM})^\top\) is the covariate vector at time \(t\), with \(X_{nt0} = 1\) (intercept) and \(X_{ntm}\) (\(m=1,\dots,M\)) as the \(m\)-th covariate value. The coefficient vector \(\boldsymbol{\gamma}_{lkt} = (\gamma_{lkt0}, \gamma_{lkt1}, \dots, \gamma_{lktM})^\top\) corresponds element-wise to \(\mathbf{X}_{nt}\), where \(\gamma_{lkt0}\) is the intercept and \(\gamma_{lktm}\) (\(m \geq 1\)) are regression coefficients. Class \(L\) is the reference class (\(\boldsymbol{\gamma}_{lLt} = \mathbf{0}\) for all \(l\)).
The full observed-data likelihood integrates over all possible latent class paths \(\mathbf{z}_n = (z_{n1},\dots,z_{nT})\): $$ \begin{aligned} \log \mathcal{L}(\boldsymbol{\theta}) &= \sum_{n=1}^N \log \Biggl[ \sum_{\mathbf{z}_n \in \{1,\dots,L\}^T} \Bigl( \prod_{t=1}^T \text{CEP}_t(z_{nt}, \hat{z}_{nt}) \Bigr) \cdot \\ &\quad P(Z_{n1}=z_{n1} \mid \mathbf{X}_{n1}) \cdot \\ &\quad \prod_{t=2}^T P(Z_{nt}=z_{nt} \mid Z_{n,t-1}=z_{n,t-1}, \mathbf{X}_{nt}) \Biggr] \end{aligned} $$ Parameters \(\boldsymbol{\theta} = \{\boldsymbol{\beta}, \boldsymbol{\gamma}\}\) are estimated via maximum likelihood using the BOBYQA algorithm (box-constrained derivative-free optimization). Reference class \(L\) satisfies \(\boldsymbol{\beta}_L = \mathbf{0}\) and \(\boldsymbol{\gamma}_{kl t} = \mathbf{0}\) for all \(k\) when \(l = L\).
When method.SE = "Bootstrap", standard errors are estimated using a nonparametric bootstrap procedure:
Draw \(B\) (=n.Bootstrap) independent samples of size \(N\) with replacement from the original data.
For each bootstrap sample \(b=1,\dots,B\), re-estimate the full three-step LTA model (Steps 1–3), yielding parameter vector \(\hat{\boldsymbol{\theta}}^{(b)}\).
Compute the bootstrap standard error for each parameter as the sample standard deviation across replicates: $$ \widehat{\mathrm{SE}}_{\mathrm{boot}}(\hat{\theta}_j) = \sqrt{ \frac{1}{B-1} \sum_{b=1}^B \left( \hat{\theta}_j^{(b)} - \bar{\theta}_j \right)^2 }, $$ where \(\bar{\theta}_j = \frac{1}{B}\sum_{b=1}^B \hat{\theta}_j^{(b)}\).
This approach does not rely on large-sample normality or correct specification of the information matrix, making it particularly suitable for complex models like LTA where analytic derivatives are difficult or unstable. However, it increases computational cost linearly with \(B\).
Reference Class: The last latent class (\(L\)) is always treated as the reference category.
All corresponding coefficients in beta and gamma are fixed to zero (\(\boldsymbol{\beta}_L = \mathbf{0}\), \(\boldsymbol{\gamma}_{kl t} = \mathbf{0}\) for \(l=L\)).
CEP Matrices: When CEP.error = TRUE, misclassification probabilities are estimated
non-parametrically using Step 1 posterior probabilities. This corrects for
classification uncertainty. Setting CEP.timeCross = TRUE assumes these
error structures are identical across time (measurement invariance).
See in get.CEP.
Covariate Handling: Covariates for initial status (time 1) and transitions (time \(t \geq 2\)) can differ. For transitions to time \(t\), the covariate matrix must have dimensions \(N \times (M_{t}+1)\), i.e., an intercept column of all \(1\) plus \(M_{t}\) columns of covariates in time \(t\).
Optimization: Step 3 uses L-BFGS-B via nloptr to ensure numerical stability.
Standard errors are derived from the inverse Hessian (via hessian). If singular,
Moore-Penrose pseudoinverse (ginv) is used, and negative variances are set to NA.
Computational Complexity: Likelihood evaluation requires enumerating \(L^T\) possible latent paths.
Bootstrap Computation: Each bootstrap iteration re-fits Steps 1–3 independently, including re-estimation of
P.Z.Xns, CEP matrices and transition parameters.
To ensure reproducibility, set a seed before calling LTA() when using method.SE = "Bootstrap".
Progress messages during bootstrapping include current replicate index and optimization diagnostics.
Users should monitor convergence in each bootstrap run; failed runs will result in NA entries in SEs and derived statistics.
Liang, Q., la Torre, J. d., & Law, N. (2023). Latent Transition Cognitive Diagnosis Model With Covariates: A Three-Step Approach. Journal of Educational and Behavioral Statistics, 48(6), 690-718. https://doi.org/10.3102/10769986231163320
McLachlan, G. J., & Peel, D. (2004). Finite Mixture Models. Wiley. https://books.google.com.sg/books?id=c2_fAox0DQoC
Vermunt, J. K. (2010). Latent class modeling with covariates: Two improved three-step approaches. Political Analysis, 18(4), 450–469. https://doi.org/10.1093/pan/mpq025
# \donttest{
library(LCPA)
set.seed(123)
N <- 2000 ## sample size
L <- 3 ## number of latent class
I <- 6 ## number of variables/items
## Covariates at time point T1
covariates.inter <- rep(1, N) # Intercept term is always 1 for each individual
covariates.X11 <- rnorm(N) # Covariate X1 is a continuous variable
# Combine into covariates at T1
covariates.T1 <- cbind(Intercept=covariates.inter, X1=covariates.X11)
## Covariates at time point T2
covariates.inter <- rep(1, N) # Intercept term is always 1 for each individual
covariates.X21 <- rnorm(N) # Covariate X1 is a continuous variable
# Combine into covariates at T1
covariates.T2 <- cbind(Intercept=covariates.inter, X1=covariates.X21)
# Combine into final covariates list
covariates <- list(t1=covariates.T1, t2=covariates.T2)
## Simulate beta coefficients
# last column is zero because the last category is used as reference
## fix reference class to class 2
beta <- matrix(c( 0.8, 0.0, 0.1,
-0.3, 0.0, -0.5), ncol=L, byrow=TRUE)
## Simulate gamma coefficients
gamma <- list(
lapply(1:L, function(l) {
lapply(1:L, function(k) if(k < L)
runif(2, -2.0, 2.0) else c(0, 0)) # Last class as reference
})
)
## Simulate the data
sim_custom <- sim.LTA(
N=N, I=I, L=L, times=2, type="LCA", IQ=0.9,
covariates=covariates,
beta=beta,
gamma=gamma
)
summary(sim_custom)
responses <- sim_custom$responses
covariates <- sim_custom$covariates
## fix reference class to class 2
LTA.obj <- LTA(responses, L=L, ref.class=2, type="LCA",
covariates=covariates,
method.SE="Bootstrap", n.Bootstrap=10,
CEP.timeCross=FALSE, CEP.error=TRUE, covariates.timeCross=FALSE,
par.ini = "random", method="EM", vis = TRUE)
print(LTA.obj)
# }
Run the code above in your browser using DataLab