LPA: Fit Latent Profile Analysis

Description

This function estimates parameters of a Latent Profile Analysis (LPA) model for continuous observed variables using one of three methods: Expectation-Maximization (EM) algorithm, Neural Network Estimation (NNE), or external Mplus software. It supports flexible covariance structures and initialization strategies.

Usage

LPA(
  response,
  L = 2,
  par.ini = "random",
  constraint = "VV",
  method = "EM",
  is.sort = TRUE,
  nrep = 20,
  starts = 100,
  maxiter.wa = 20,
  vis = TRUE,
  control.EM = NULL,
  control.Mplus = NULL,
  control.NNE = NULL
)

Value

An object of class "LPA" containing:

params

List with estimated profile parameters:

means: $L \times I$ matrix of estimated mean vectors for each profile.
covs: $I \times I \times L$ array of estimated covariance matrices for each profile.
P.Z: Vector of length $L$ with profile prior probabilities.

npar

Number of free parameters in the model (depends on constraint).

Log.Lik

Log-likelihood of the final model.

AIC

Akaike Information Criterion value.

BIC

Bayesian Information Criterion value.

best_BIC

Best BIC value across nrep runs (if applicable).

P.Z.Xn

$N \times L$ matrix of posterior profile probabilities for each observation.

P.Z

Vector of length $L$ containing the prior probabilities/structural parameters/proportions for each latent class.

Z

Vector of length $N$ with MAP-classified profile memberships.

Log.Lik.history

Vector tracking log-likelihood at each EM iteration (only for method="EM").

Log.Lik.nrep

Vector of log-likelihoods from each replication run.

model

The optimal model object:

For method="NNE": Trained neural network model.
For method="Mplus": Estimated Mplus model.

Arguments

response

A numeric matrix of dimension $N \times I$, where $N$ is the number of individuals/participants/observations and $I$ is the number of continuous observed items/variables. Missing values are not allowed. Note that response must be standardized using scale or normalize before input.

L

Integer specifying the number of latent profiles (default: 2).

par.ini

Specification for parameter initialization. Options include:

"random": Random initialization of means and covariances (default).
"kmeans": Initializes parameters via K-means clustering on observed data (McLachlan & Peel, 2004).
A list containing exactly three elements:

means

An $L \times I$ matrix of initial mean vectors for each profile.

covs

An $I \times I \times L$ array of initial covariance matrices for each profile.

P.Z

A numeric vector of length $L$ specifying initial prior probabilities for profiles.

constraint

Character string specifying covariance structure constraints:

"VV": Varying variances and varying covariances across profiles (heterogeneous full covariance; Default).

"VE"

Varying variances but equal correlations across profiles.

"EV"

Equal variances but varying covariances across profiles.

"EE"

Equal variances and equal covariances across profiles (homogeneous full covariance).

"E0"

Equal variances across profiles, zero covariances (diagonal with shared variances).

"V0"

Varying variances across profiles, zero covariances (diagonal with free variances).

list

Custom constraints. Each element is a 2-element integer vector specifying variables whose covariance parameters are constrained equal across all classes. The constraint applies to:

Variances: When both indices are identical (e.g., c(3,3) forces variance of variable 3 to be equal across classes).
Covariances: When indices differ (e.g., c(1,2) forces covariance between variables 1 and 2 to be equal across classes).

Constraints are symmetric (e.g., c(1,2) automatically constrains c(2,1)). All unconstrained parameters vary freely across classes while maintaining positive definiteness.

method

Character string specifying estimation algorithm:

"EM": Expectation-Maximization algorithm (Default).
"NNE": Neural Network Estimation with transformer architecture (experimental; uses transformer + simulated annealing, more reliable than both "EM" and "Mplus"). See install_python_dependencies.
"Mplus": Calls external Mplus software for estimation. Uses Mplus defaults for optimization unless overridden by control.Mplus.

is.sort

A logical value. If TRUE (Default), the latent classes will be ordered in descending order according to P.Z. All other parameters will be adjusted accordingly based on the reordered latent classes.

nrep

Integer controlling replication behavior:

If par.ini = "random", number of random initializations.
If par.ini = "kmeans", number of K-means runs for initialization.
For method="Mplus", controls number of random starts in Mplus via STARTS option.
Best solution is selected by log-likelihood/BIC across replications.
Ignored for user-provided initial parameters.

starts

Number of random initializations to explore during warm-up phase (default: 100).

maxiter.wa

Maximum number of training iterations allowed per warm-up run. After completion, the top nrep solutions (by log-likelihood) are promoted to final training phase (default: 20).

vis

Logical. If TRUE, displays progress information during estimation (default: TRUE).

control.EM

List of control parameters for EM algorithm:

maxiter: Maximum iterations (default: 2000).

tol

Convergence tolerance for log-likelihood difference (default: 1e-4).

control.Mplus

List of control parameters for Mplus estimation:

maxiter: Maximum iterations for Mplus optimization (default: 2000).

tol

Convergence tolerance for log-likelihood difference (default: 1e-4).

files.path

Character string specifying the directory path where Mplus will write its intermediate files (e.g., .inp model input, .dat data file, .out output, and saved posterior probabilities). This argument is required — if NULL (default), the function throws an error. The specified directory must exist and be writable; if it does not exist, the function attempts to create it recursively. A unique timestamped subdirectory (e.g., "Mplus_LPA_YYYY-MM-DD_HH-MM-SS") will be created within this path to store all run-specific files and avoid naming conflicts.

files.clean

Logical. If TRUE (default), all intermediate files and the temporary working directory created for this run are deleted upon successful completion or error exit (via on.exit()). If FALSE, all generated files are retained in files.path (or the auto-generated temp dir) for inspection or debugging. Note: when files.path = NULL, even if files.clean = FALSE, the temporary directory may still be cleaned up by the system later — for guaranteed persistence, specify a custom files.path.

control.NNE

List of control parameters for NNE algorithm:

hidden.layers: Integer vector specifying layer sizes in fully-connected network (default: c(12,12)).

activation.function

Activation function (e.g., "tanh", default: "tanh").

d.model

Dimensionality of transformer encoder embeddings (default: 8).

nhead

Number of attention heads in transformer (default: 2).

dim.feedforward

Dimensionality of transformer feedforward network (default: 16).

eps

Small constant for numerical stability (default: 1e-8).

lambda

A factor for slight regularization of all parameters (default: 1e-5).

initial.temperature

Initial temperature for simulated annealing (default: 1000).

cooling.rate

Cooling rate per iteration in simulated annealing (default: 0.5).

maxiter.sa

Maximum iterations for simulated annealing (default: 1000).

threshold.sa

Minimum temperature threshold for annealing (default: 1e-10).

maxiter

Maximum training epochs (default: 1000).

maxiter.early

Patience parameter for early stopping (default: 50).

maxcycle

Maximum cycles for optimization (default: 10).

lr

Learning rate, controlling the step size of neural network parameter updates (default: 0.025).

scheduler.patience

Patience for learning rate decay (if the loss function does not improve for more than patience consecutive epochs, the learning rate will be reduced) (default: 10).

scheduler.factor

Learning rate decay factor; the new learning rate equals the original learning rate multiplied by scheduler.factor (default: 0.70).

plot.interval

Interval (in epochs) for plotting training diagnostics (default: 100).

device

Specifies the hardware device; can be "CPU" (default) or "GPU". If the GPU is not available, it automatically falls back to CPU.

EM Algorithm

When method = "EM", parameter estimation uses the Expectation-Maximization (EM) algorithm to maximize the observed-data log-likelihood:

$$ \log \mathcal{L} = \sum_{n=1}^N \log \left[ \sum_{l=1}^L \pi_l \cdot \mathcal{N}(\mathbf{x}_n \mid \boldsymbol{\mu}_l, \boldsymbol{\Sigma}_l) \right] $$

The algorithm iterates between two steps until convergence (change in log-likelihood < tol or max iterations reached):

E-step:

Compute posterior class probabilities (responsibilities) for observation $n$ and class $l$: $$\tau_{nl} = \frac{\pi_l \cdot \mathcal{N}(\mathbf{x}_n \mid \boldsymbol{\mu}_l, \boldsymbol{\Sigma}_l)}{\sum_{k=1}^L \pi_k \cdot \mathcal{N}(\mathbf{x}_n \mid \boldsymbol{\mu}_k, \boldsymbol{\Sigma}_k)}$$ where $\mathcal{N}(\cdot)$ is the multivariate normal density, $\pi_l$ is the prior class probability, and $\boldsymbol{\mu}_l$, $\boldsymbol{\Sigma}_l$ are current parameters. Numerical stability is ensured via the log-sum-exp trick.

M-step:

Update parameters using responsibilities $\tau_{nl}$:

Class probabilities: $\pi_l^{\text{new}} = \frac{1}{N}\sum_{n=1}^N \tau_{nl}$
Class means: $\boldsymbol{\mu}_l^{\text{new}} = \frac{\sum_{n=1}^N \tau_{nl} \mathbf{x}_n}{\sum_{n=1}^N \tau_{nl}}$
Class covariances: Updated under constraints:

"VV"

$\boldsymbol{\Sigma}_l^{\text{new}} = \frac{\sum_{n=1}^N \tau_{nl} (\mathbf{x}_n - \boldsymbol{\mu}_l^{\text{new}}) (\mathbf{x}_n - \boldsymbol{\mu}_l^{\text{new}})^\top}{\sum_{n=1}^N \tau_{nl}}$

"EE"

Shared covariance: $\boldsymbol{\Sigma}^{\text{new}} = \frac{\sum_{l=1}^L \sum_{n=1}^N \tau_{nl} (\mathbf{x}_n - \boldsymbol{\mu}_l^{\text{new}}) (\mathbf{x}_n - \boldsymbol{\mu}_l^{\text{new}})^\top}{\sum_{l=1}^L \sum_{n=1}^N \tau_{nl}}$

"VE" (default) / "EV"

Hybrid constraints (e.g., "VE": varying variances, equal correlations). Off-diagonal elements use weighted averages across classes; diagonals retain class-specific values.

Custom constraints

User-specified variances/covariances (e.g., list(c(1,2), c(2, 2)), meaning the covariates of observed variable 1 and observed variable 2 are equal across latent classes, and the variance of observed variable 2 is equal across classes) are forced equal across classes via weighted averaging.

Covariance matrices are regularized to ensure positive definiteness:

Eigenvalues < jitter (1e-10) are replaced with jitter
Failed Cholesky decompositions trigger diagonal jittering or perturbation of non-constrained elements

Edge Handling:

Empty classes ($\sum_n \tau_{nl} < 10^{-5}$) are reinitialized by redistributing responsibilities.
Non-finite likelihoods trigger fallback to previous valid parameters or covariance perturbation.
Univariate cases ($I=1$) bypass Cholesky decomposition for direct variance updates.

Neural Network Estimation (NNE)

When method = "NNE", parameters are estimated using a hybrid neural network architecture combining fully-connected layers with transformer-based attention mechanisms. This approach jointly optimizes profile parameters and posterior probabilities through stochastic optimization with simulated annealing. See install_python_dependencies. Key components include:

Architecture:

Input Representation:

Continuous observed variables $\mathbf{x}_n \in \mathbb{R}^I$ are standardized (mean-centered, unit-variance) internally during training to improve numerical stability. No encoding is needed — raw values are fed directly.

Feature Encoder (Feedforward Network):

A multi-layer perceptron with architecture defined by hidden.layers and activation.function maps the continuous input vector into a latent space of dimension d.model. This layer learns non-linear feature combinations predictive of latent profile membership.

Attention Refiner (Transformer Encoder)

A transformer encoder with nhead attention heads that learns latent class prior probabilities $\boldsymbol{\pi} = (\pi_1, \pi_2, \dots, \pi_L)$ directly from observed responses.

Input: response matrix ($N \times I$), where $N$ = observations, $I$ = continuous variables.
Mechanism: Self-attention dynamically weighs variable importance during profile assignment, capturing complex multivariate interactions.
Output: Class prior vector $\boldsymbol{\pi}$ computed as the mean of posteriors: $$\pi_l = \frac{1}{N}\sum_{n=1}^N attention(\mathbf{X}_n)$$ This ensures probabilistic consistency with the mixture model framework.

Parameter Head (Means & Covariances):

Two separate projection heads branch from the transformer output:

Means Head: Linear projection to $L \times I$ matrix $\boldsymbol{\mu}_l$.
Covariance Head: Outputs lower-triangular elements of Cholesky factors $\mathbf{L}_l$ for each profile. Diagonal elements are passed through softplus to ensure positivity; off-diagonals use tanh scaled by 1.2 to bound magnitude and promote stability. The full covariance is reconstructed via $\boldsymbol{\Sigma}_l = \mathbf{L}_l \mathbf{L}_l^\top$.

After reconstruction, covariance constraints (e.g., "EE", "V0", or custom lists) are applied by averaging constrained elements across profiles and re-symmetrizing.

Optimization Strategy:

Hybrid Training Protocol: Alternates between:
- Gradient-based phase: AdamW optimizer minimizes negative log-likelihood with weight decay regularization: $$-\log \mathcal{L} + \lambda \|\boldsymbol{\theta}\|_2^2$$ where $\lambda$ is controlled by lambda (default: 1e-5). Learning rate decays adaptively when loss plateaus (controlled by scheduler.patience and scheduler.factor).
- Simulated annealing phase: After gradient-based early stopping (maxiter.early), parameters are perturbed with noise scaled by temperature: $$\theta_{\text{new}} = \theta_{\text{current}} + \mathcal{N}(0, \theta_{\text{current}} \times \frac{T}{T_0})$$ Temperature $T$ decays geometrically ($T \leftarrow T \times \text{cooling.rate}$) from initial.temperature until threshold.sa is reached. This escapes poor local minima.
Each full cycle (gradient descent + annealing) repeats up to maxcycle times.
Model Selection: Across nrep random restarts (using Dirichlet-distributed initializations or K-means), the solution with lowest BIC is retained.
Diagnostics: Training loss, annealing points, and global best solution are plotted when vis=TRUE.

Constraint Handling:

Covariance constraints (constraint) are enforced after activation via:
- Shared Parameters: Variances/covariances marked for equality are replaced by their average across profiles.
- Positive Definiteness: Non-positive definite matrices are corrected via eigenvalue clamping, diagonal jittering, or adaptive Cholesky decomposition.
Custom constraints: e.g., list(c(1,2), c(3,3)), force equality of specific covariance elements across profiles, with symmetry ($\sigma_{12} = \sigma_{21}$) automatically enforced.

Mplus

When method = "Mplus", estimation is delegated to external Mplus software. The function automates the entire workflow:

Workflow:

Temporary Directory Setup

Creates inst/Mplus to store:

Mplus input syntax (.inp)
Data file in Mplus format (.dat)
Posterior probabilities output (.dat)

Files are automatically deleted after estimation unless control.Mplus$clean.files = FALSE.

Syntax Generation

Constructs Mplus syntax with:

CLASSES = c1(L) specification for $L$ latent classes
ANALYSIS block with optimization controls:

TYPE = mixture

Standard mixture modeling setup

STARTS = starts nrep

Random starts and final stage optimizations

STITERATIONS = maxiter.wa

max itertions during starts.

MITERATIONS = maxiter

Maximum EM iterations

CONVERGENCE = tol

Log-likelihood convergence tolerance

MODEL block reflecting the specified constraint structure

Execution

Calls Mplus via MplusAutomation::mplusModeler()

, which:

Writes data to disk in Mplus-compatible format Invokes the Mplus executable (requires valid license) Captures convergence status and output

Constraint Handling:

Covariance constraints (constraint) are enforced after activation via:
- Shared Parameters: Variances/covariances marked for equality are replaced by their average across profiles.
- Positive Definiteness: Non-positive definite matrices are corrected via eigenvalue clamping, diagonal jittering, or adaptive Cholesky decomposition.
Custom constraints: e.g., list(c(1,2), c(3,3)), force equality of specific covariance elements across profiles, with symmetry ($\sigma_{12} = \sigma_{21}$) automatically enforced.

References

McLachlan, G. J., & Peel, D. (2004). Finite Mixture Models. Wiley. https://books.google.com.sg/books?id=c2_fAox0DQoC

Examples

Run this code

# Simulate bivariate continuous data for 2 profiles
set.seed(123)
data.obj <- sim.LPA(N = 500, I = 3, L = 2, constraint = "VV")
response <- data.obj$response

## It is strongly recommended to perform the following
## standardization to obtain more stable results.
## Standardization is not performed here in order to
## compare estimated values with true values.
# response <- normalize(response)

# Fit 2-profile model with VV constraint (default)
fit_vv <- LPA(response, L = 2, constraint = "VV")

# Fit 2-profile model with E0 constraint using neural network estimation
# need Python
if (FALSE) {
fit_e0_nne <- LPA(response, L = 2, constraint = "E0", method = "NNE", nrep = 2)
}

# Fit 2-profile model using Mplus
# Requires Mplus to be installed and available in system PATH.
# NOTE: 'files.path' in control.Mplus is REQUIRED — the function will
# throw an error if not provided.
# This example creates a timestamped subdirectory
# (e.g., "Mplus_LPA_YYYY-MM-DD_HH-MM-SS") under './inst'
# to store all temporary Mplus files (.inp, .dat, .out, etc.).
# The 'inst' directory will be created if it does not exist.
# Setting files.clean=FALSE means temporary files will be preserved after execution.
if (FALSE) {
fit_mplus <- LPA(response, L = 2, method = "Mplus", constraint = list(c(1, 2), c(3, 3)),
                 control.Mplus = list(files.path = "inst", files.clean=FALSE))
}

Run the code above in your browser using DataLab