Generates synthetic multivariate continuous data from a latent profile model with L latent classes.
Supports flexible covariance structure constraints (including custom equality constraints) and
class size distributions. All covariance matrices are ensured to be positive definite.
sim.LPA(
N = 1000,
I = 5,
L = 2,
constraint = "VV",
distribution = "random",
mean.range = c(-2, 2),
covs.range = c(0.01, 4),
params = NULL,
is.sort = TRUE
)A list containing:
Numeric matrix (\(N \times I\)) of simulated observations. Rows are observations,
columns are variables named "V1", "V2", ..., or "UV" for univariate data.
Numeric matrix (\(L \times I\)) of true class-specific means.
Row names: "Class1", "Class2", ...; column names match response.
Array (\(I \times I \times L\)) of true class-specific covariance matrices.
Dimensions: variables x variables x classes. Constrained parameters have identical values across class slices.
Dimension names match response and class labels.
Numeric matrix (\(N \times L\)) of true class membership probabilities (one-hot encoded).
Row i, column l = 1 if observation i belongs to class l, else 0.
Row names: "O1", "O2", ...; column names: "Class1", "Class2", ...
Numeric vector (length \(L\)) of true class proportions.
Named with class labels (e.g., "Class1").
Integer vector (length \(N\)) of true class assignments (1 to L).
Named with observation IDs (e.g., "O1").
Original constraint specification (character string or list) passed to the function.
Integer; total number of observations to simulate. Must be \(\geq\) L (Default = 1000).
Integer; number of continuous observed variables. Must be \(\geq 1\) (Default = 5).
Integer; number of latent profiles (classes). Must be \(\geq 1\) (Default = 2).
Character string or list specifying covariance constraints. See detailed description below.
Default is "VV" (fully heterogeneous covariances).
Character; distribution of class sizes. Options: "random" (default) or "uniform".
Numeric vector of length 2; range for sampling class-specific means.
Each variable's means are sampled uniformly from mean.range[1] to mean.range[2].
Default: c(-4, 4).
Numeric vector of length 2; range for sampling variance parameters (diagonal elements).
Must satisfy covs.range[1] > 0 and covs.range[2] > covs.range[1]. Off-diagonal covariances
are derived from correlations scaled by these variances. Default: c(0.01, 4).
List with fixed parameters for simulation:
par\(L \times I \times K_{\max}\) array of conditional response probabilities per latent class.
P.ZVector of length \(L\) with latent class prior probabilities.
ZVector of length \(N\) containing the latent classes of observations. A fixed
observation classes Z is applied directly to simulate data only when P.Z
is NULL and Z is a N length vector.
A logical value. If TRUE (Default), the latent classes will be ordered in descending
order according to P.Z. All other parameters will be adjusted accordingly
based on the reordered latent classes.
The constraint parameter controls equality constraints on covariance parameters across classes:
"UE" (Univariate only)Equal variance across all classes.
"UV" (Univariate only)Varying variances across classes.
"E0"Equal variances across classes, zero covariances (diagonal matrix with shared variances).
"V0"Varying variances across classes, zero covariances (diagonal matrix with free variances).
"EE"Equal full covariance matrix across all classes (homogeneous).
"EV"Equal variances but varying covariances (equal diagonal, free off-diagonal).
"VE"Varying variances but equal correlations (free diagonal, equal correlation structure).
"VV"Varying full covariance matrices across classes (heterogeneous; default).
Each element specifies a pair of variables whose covariance parameters are constrained equal across classes:
c(i,i)Constrains variance of variable i to be equal across all classes.
c(i,j)Constrains covariance between variables i and j to be equal across all classes
(symmetric: automatically includes c(j,i)).
Unconstrained parameters vary freely. The algorithm ensures positive definiteness by:
Generating a base positive definite matrix S0.
Applying constraints via a logical mask.
Adjusting unconstrained variances to maintain positive definiteness.
Critical requirements for custom constraints:
I.I=1), only list(c(1,1)) is valid.
"random"(Default) Class proportions drawn from Dirichlet distribution (\(\alpha = 3\) for all classes),
ensuring no empty classes. Sizes are rounded to integers with adjustment for exact N.
"uniform"Equal probability of class membership (\(1/L\) per class), sampled with replacement.
Mean Generation: For each variable, \(3L\) candidate means are sampled uniformly from mean.range.
\(L\) distinct means are selected without replacement to ensure separation between classes.
Covariance Generation:
Positive Definiteness: All covariance matrices are adjusted using Matrix::nearPD
and eigenvalue thresholds (\(> 10^{-8}\)) to guarantee validity. Failed attempts trigger explicit errors.
Univariate Case (I=1): Constraints "UE" and "UV" are enforced automatically.
Predefined constraints like "E0" map to "UE".
VE Constraint: Requires special handling—base off-diagonal elements are fixed, and diagonals
are sampled above a minimum threshold to maintain positive definiteness. May fail if covs.range is too narrow.
Class Assignment:
"random": Uses Dirichlet distribution (\(\alpha = 3\)) to avoid extremely small classes.
Sizes are rounded and adjusted to sum exactly to N.
"uniform": Simple random sampling with equal probability. May produce empty classes if N is small.
Data Generation: Observations are simulated using mvtnorm::rmvnorm per class.
Final data and class labels are shuffled to remove ordering artifacts.
# Example 1: Bivariate data, 3 classes, heterogeneous covariances (default)
sim_data <- sim.LPA(N = 500, I = 2, L = 3, constraint = "VV")
# Example 2: Univariate data, equal variances
# 'E0' automatically maps to 'UE' for I=2
sim_uni <- sim.LPA(N = 200, I = 2, L = 2, constraint = "E0")
# Example 3: Custom constraints
# - Equal covariance between V1 and V2 across classes
# - Equal variance for V3 across classes
sim_custom <- sim.LPA(
N = 300,
I = 3,
L = 4,
constraint = list(c(1, 2), c(3, 3))
)
# Example 4: VE constraint (varying variances, equal correlations)
sim_ve <- sim.LPA(N = 400, I = 3, L = 3, constraint = "VE")
# Example 5: Uniform class sizes
sim_uniform <- sim.LPA(N = 300, I = 4, L = 5, distribution = "uniform")
Run the code above in your browser using DataLab