sim.LPA: Simulate Data for Latent Profile Analysis

Description

Generates synthetic multivariate continuous data from a latent profile model with L latent classes. Supports flexible covariance structure constraints (including custom equality constraints) and class size distributions. All covariance matrices are ensured to be positive definite.

Usage

sim.LPA(
  N = 1000,
  I = 5,
  L = 2,
  constraint = "VV",
  distribution = "random",
  mean.range = c(-2, 2),
  covs.range = c(0.01, 4),
  params = NULL,
  is.sort = TRUE
)

Value

A list containing:

response: Numeric matrix (\(N \times I\)) of simulated observations. Rows are observations, columns are variables named "V1", "V2", ..., or "UV" for univariate data.
means: Numeric matrix (\(L \times I\)) of true class-specific means. Row names: "Class1", "Class2", ...; column names match response.
covs: Array (\(I \times I \times L\)) of true class-specific covariance matrices. Dimensions: variables x variables x classes. Constrained parameters have identical values across class slices. Dimension names match response and class labels.
P.Z.Xn: Numeric matrix (\(N \times L\)) of true class membership probabilities (one-hot encoded). Row i, column l = 1 if observation i belongs to class l, else 0. Row names: "O1", "O2", ...; column names: "Class1", "Class2", ...
P.Z: Numeric vector (length \(L\)) of true class proportions. Named with class labels (e.g., "Class1").
Z: Integer vector (length \(N\)) of true class assignments (1 to L). Named with observation IDs (e.g., "O1").
constraint: Original constraint specification (character string or list) passed to the function.

Arguments

N

Integer; total number of observations to simulate. Must be \(\geq\) L (Default = 1000).

I

Integer; number of continuous observed variables. Must be \(\geq 1\) (Default = 5).

L

Integer; number of latent profiles (classes). Must be \(\geq 1\) (Default = 2).

constraint

Character string or list specifying covariance constraints. See detailed description below. Default is "VV" (fully heterogeneous covariances).

distribution

Character; distribution of class sizes. Options: "random" (default) or "uniform".

mean.range

Numeric vector of length 2; range for sampling class-specific means. Each variable's means are sampled uniformly from mean.range[1] to mean.range[2]. Default: c(-4, 4).

covs.range

Numeric vector of length 2; range for sampling variance parameters (diagonal elements). Must satisfy covs.range[1] > 0 and covs.range[2] > covs.range[1]. Off-diagonal covariances are derived from correlations scaled by these variances. Default: c(0.01, 4).

params

List with fixed parameters for simulation:

par: \(L \times I \times K_{\max}\) array of conditional response probabilities per latent class.

P.Z

Vector of length \(L\) with latent class prior probabilities.

Z

Vector of length \(N\) containing the latent classes of observations. A fixed observation classes Z is applied directly to simulate data only when P.Z is NULL and Z is a N length vector.

is.sort

A logical value. If TRUE (Default), the latent classes will be ordered in descending order according to P.Z. All other parameters will be adjusted accordingly based on the reordered latent classes.

Covariance Constraints

The constraint parameter controls equality constraints on covariance parameters across classes:

Predefined Constraints (Character Strings):

"UE" (Univariate only): Equal variance across all classes.
"UV" (Univariate only): Varying variances across classes.
"E0": Equal variances across classes, zero covariances (diagonal matrix with shared variances).
"V0": Varying variances across classes, zero covariances (diagonal matrix with free variances).
"EE": Equal full covariance matrix across all classes (homogeneous).
"EV": Equal variances but varying covariances (equal diagonal, free off-diagonal).
"VE": Varying variances but equal correlations (free diagonal, equal correlation structure).
"VV": Varying full covariance matrices across classes (heterogeneous; default).

Custom Constraints (List of integer vectors):

Each element specifies a pair of variables whose covariance parameters are constrained equal across classes:

c(i,i): Constrains variance of variable i to be equal across all classes.

c(i,j)

Constrains covariance between variables i and j to be equal across all classes (symmetric: automatically includes c(j,i)).

Unconstrained parameters vary freely. The algorithm ensures positive definiteness by:

Generating a base positive definite matrix S0.
Applying constraints via a logical mask.
Adjusting unconstrained variances to maintain positive definiteness.

Critical requirements for custom constraints:

At least one variance must be unconstrained if any off-diagonal covariance is unconstrained.
All indices must be between 1 and I.
For univariate data (I=1), only list(c(1,1)) is valid.

Class Size Distribution

"random": (Default) Class proportions drawn from Dirichlet distribution (\(\alpha = 3\) for all classes), ensuring no empty classes. Sizes are rounded to integers with adjustment for exact N.

"uniform"

Equal probability of class membership (\(1/L\) per class), sampled with replacement.

Details

Mean Generation: For each variable, \(3L\) candidate means are sampled uniformly from mean.range. \(L\) distinct means are selected without replacement to ensure separation between classes.

Covariance Generation:

Positive Definiteness: All covariance matrices are adjusted using Matrix::nearPD and eigenvalue thresholds (\(> 10^{-8}\)) to guarantee validity. Failed attempts trigger explicit errors.
Univariate Case (I=1): Constraints "UE" and "UV" are enforced automatically. Predefined constraints like "E0" map to "UE".
VE Constraint: Requires special handling—base off-diagonal elements are fixed, and diagonals are sampled above a minimum threshold to maintain positive definiteness. May fail if covs.range is too narrow.

Class Assignment:

"random": Uses Dirichlet distribution (\(\alpha = 3\)) to avoid extremely small classes. Sizes are rounded and adjusted to sum exactly to N.
"uniform": Simple random sampling with equal probability. May produce empty classes if N is small.

Data Generation: Observations are simulated using mvtnorm::rmvnorm per class. Final data and class labels are shuffled to remove ordering artifacts.

Examples

Run this code

# Example 1: Bivariate data, 3 classes, heterogeneous covariances (default)
sim_data <- sim.LPA(N = 500, I = 2, L = 3, constraint = "VV")

# Example 2: Univariate data, equal variances
# 'E0' automatically maps to 'UE' for I=2
sim_uni <- sim.LPA(N = 200, I = 2, L = 2, constraint = "E0")

# Example 3: Custom constraints
# - Equal covariance between V1 and V2 across classes
# - Equal variance for V3 across classes
sim_custom <- sim.LPA(
  N = 300,
  I = 3,
  L = 4,
  constraint = list(c(1, 2), c(3, 3))
)

# Example 4: VE constraint (varying variances, equal correlations)
sim_ve <- sim.LPA(N = 400, I = 3, L = 3, constraint = "VE")

# Example 5: Uniform class sizes
sim_uniform <- sim.LPA(N = 300, I = 4, L = 5, distribution = "uniform")

Run the code above in your browser using DataLab