multiLCA: Estimates and plots single- and multilevel latent class models

Description

The multiLCA function in the multilevLCA package estimates single- and multilevel measurement and structural latent class models. Moreover, the function performs two different strategies for model selection. Methodological details can be found in Bakk et al. (2022), Bakk and Kuha (2018), and Di Mari et al. (2023).

Different output visualization tools are available for all model specifications. See, e.g., plot.multiLCA.

Usage

multiLCA(
data,
Y,
iT,
id_high = NULL,
iM = NULL,
Z = NULL,
Zh = NULL,
incomplete = FALSE,
fixedslopes = FALSE,
startval = NULL,
kmea = TRUE,
extout = FALSE,
dataout = TRUE,
sequential = TRUE,
numFreeCores = 2,
maxIter = 1e3,
tol = 1e-8,
reord = TRUE,
fixedpars = 1,
NRmaxit = 100,
NRtol = 1e-6,
verbose = TRUE
)

Value

Single-level model estimation returns (if extout = FALSE, a subset):

vPi: Class proportions
mPhi: Response probabilities given the latent classes
mU: Matrix of posterior class assignment (proportional assignment)
mU_modal: Matrix of posterior class assignment (modal assignment)
vU_modal: Vector of posterior class assignment (modal assignment)
mClassErr: Expected number of classification errors
mClassErrProb: Expected proportion of classification errors
AvgClassErrProb: Average of mClassErrProb
R2entr: Entropy-based R\(^2\)
BIC: Bayesian Information Criterion (BIC)
AIC: Akaike Information Criterion (AIC)
vGamma: Intercepts in logistic parametrization for class proportions
mBeta: Intercepts in logistic parametrization for response probabilities
parvec: Vector of logistic parameters
SEs: Standard errors
Varmat: Variance-covariance matrix
iter: Number of iterations for EM algorithm
eps: Difference between last two elements of log-likelihood sequence for EM algorithm
LLKSeries: Full log-likelihood series for EM algorithm
mScore: Contributions to log-likelihood score
spec: Model specification
missing_values: Strategy for handling of eventual missing values
sample_size: Final sample size for model estimation

Single-level model estimation with covariates returns (if extout = FALSE, a subset):

mPi: Class proportions given the covariates
vPi_avg: Sample average of mPi
mPhi: Response probabilities given the latent classes
mU: Matrix of posterior class assignment (proportional assignment)
mClassErr: Expected number of classification errors
mClassErrProb: Expected proportion of classification errors
AvgClassErrProb: Average of mClassErrProb
R2entr: Entropy-based R\(^2\)
BIC: Bayesian Information Criterion (BIC)
AIC: Akaike Information Criterion (AIC)
mGamma: Intercept and slope parameters in logistic models for conditional class membership
mBeta: Intercepts in logistic parametrization for response probabilities
parvec: Vector of logistic parameters
SEs_unc: Uncorrected standard errors
SEs_cor: Corrected standard errors (see Bakk & Kuha, 2018; Di Mari et al., 2023)
SEs_cor_gamma: Corrected standard errors only for the gammas (see Bakk & Kuha, 2018; Di Mari et al., 2023)
mQ: Cross-derivatives for asymptotic standard error correction in two-step estimation (see Bakk & Kuha, 2018; Di Mari et al., 2023)
Varmat_unc: Uncorrected variance-covariance matrix
Varmat_cor: Corrected variance-covariance matrix (see Bakk & Kuha, 2018; Di Mari et al., 2023)
mV2: Inverse of information matrix for structural model
iter: Number of iterations for EM algorithm
eps: Difference between last two elements of log-likelihood sequence for EM algorithm
LLKSeries: Full log-likelihood series for EM algorithm
spec: Model specification
estimator: Estimation approach for structural model
missing_values: Strategy for handling of eventual missing values
sample_size: Final sample size for model estimation

Multilevel model estimation returns (if extout = FALSE, a subset):

vOmega: Higher-level class proportions
mPi: Lower-level class proportions given the higher-level latent classes
mPhi: Response probabilities given the lower-level latent classes
cPMX: Posterior joint class assignment (proportional assignment)
cLogPMX: Log of cPMX
cPX: Posterior lower-level class assignment given high-level class membership (proportional assignment)
cLogPX: Log of cPX
mSumPX: Posterior higher-level class assignment for lower-level units after marginalization over the lower-level classes (proportional assignment)
mPW: Posterior higher-level class assignment for higher-level units (proportional assignment)
mlogPW: Log of mPW
mPW_N: Posterior higher-level class assignment for lower-level units (proportional assignment)
mPMsumX: Posterior lower-level class assignment for lower-level units after marginalization over the higher-level classes (proportional assignment)
R2entr_low: Lower-level entropy-based R\(^2\)
R2entr_high: Higher-level entropy-based R\(^2\)
BIClow: Lower-level Bayesian Information Criterion (BIC)
BIChigh: Higher-level Bayesian Information Criterion (BIC)
ICL_BIClow: Lower-level BIC-type approximation the integrated complete likelihood
ICL_BIChigh: Higher-level BIC-type approximation the integrated complete likelihood
AIC: Akaike Information Criterion (AIC)
vAlpha: Intercepts in logistic parametrization for higher-level class proportions
mGamma: Intercepts in logistic parametrization for conditional lower-level class proportions
mBeta: Intercepts in logistic parametrization for response probabilities
parvec: Vector of logistic parameters
SEs: Standard errors
Varmat: Variance-covariance matrix
Infomat: Expected information matrix
iter: Number of iterations for EM algorithm
eps: Difference between last two elements of log-likelihood sequence for EM algorithm
LLKSeries: Full log-likelihood series for EM algorithm
vLLK: Current log-likelihood for higher-level units
mScore: Contributions to log-likelihood score
spec: Model specification
missing_values: Strategy for handling of eventual missing values
sample_size: Final sample size for model estimation

Multilevel model estimation with lower-level covariates returns (if extout = FALSE, a subset):

vOmega: Higher-level class proportions
mPi: Lower-level class proportions given the higher-level latent classes and the covariates
mPi_avg: Sample average of mPi
mPhi: Response probabilities given the lower-level latent classes
cPMX: Posterior joint class assignment (proportional assignment)
cLogPMX: Log of cPMX
cPX: Posterior lower-level class assignment given high-level class membership (proportional assignment)
cLogPX: Log of cPX
mSumPX: Posterior higher-level class assignment for lower-level units after marginalization over the lower-level classes (proportional assignment)
mPW: Posterior higher-level class assignment for higher-level units (proportional assignment)
mlogPW: Log of mPW
mPW_N: Posterior higher-level class assignment for lower-level units (proportional assignment)
mPMsumX: Posterior lower-level class assignment for lower-level units after marginalization over the higher-level classes (proportional assignment)
R2entr_low: Lower-level entropy-based R\(^2\)
R2entr_high: Higher-level entropy-based R\(^2\)
BIClow: Lower-level Bayesian Information Criterion (BIC)
BIChigh: Higher-level Bayesian Information Criterion (BIC)
ICL_BIClow: Lower-level BIC-type approximation the integrated complete likelihood
ICL_BIChigh: Higher-level BIC-type approximation the integrated complete likelihood
AIC: Akaike Information Criterion (AIC)
vAlpha: Intercepts in logistic parametrization for higher-level class proportions
cGamma: Intercept and slope parameters in logistic models for conditional lower-level class membership
mBeta: Intercepts in logistic parametrization for response probabilities
parvec: Vector of logistic parameters
SEs_unc: Uncorrected standard errors
SEs_cor: Corrected standard errors (see Bakk & Kuha, 2018; Di Mari et al., 2023)
SEs_cor_gamma: Corrected standard errors only for the gammas (see Bakk & Kuha, 2018; Di Mari et al., 2023)
mQ: Cross-derivatives for asymptotic standard error correction in two-step estimation (see Bakk & Kuha, 2018; Di Mari et al., 2023)
Varmat_unc: Uncorrected variance-covariance matrix
Varmat_cor: Corrected variance-covariance matrix (see Bakk & Kuha, 2018; Di Mari et al., 2023)
Infomat: Expected information matrix
cGamma_Info: Expected information matrix only for the gammas
mV2: Inverse of information matrix for structural model
iter: Number of iterations for EM algorithm
eps: Difference between last two elements of log-likelihood sequence for EM algorithm
LLKSeries: Full log-likelihood series for EM algorithm
vLLK: Current log-likelihood for higher-level units
mScore: Contributions to log-likelihood score
mGamma_Score: Contributions to log-likelihood score only for the gammas
spec: Model specification
estimator: Estimation approach for structural model
missing_values: Strategy for handling of eventual missing values
sample_size: Final sample size for model estimation

Multilevel model estimation with lower- and higher-level covariates returns (if extout = FALSE, a subset):

mOmega: Higher-level class proportions given the covariates
vOmega_avg: Higher-level class proportions averaged over higher-level units
mPi: Lower-level class proportions given the higher-level latent classes and the covariates
mPi_avg: Sample average of mPi
mPhi: Response probabilities given the lower-level latent classes
cPMX: Posterior joint class assignment (proportional assignment)
cLogPMX: Log of cPMX
cPX: Posterior lower-level class assignment given high-level class membership (proportional assignment)
cLogPX: Log of cPX
mSumPX: Posterior higher-level class assignment for lower-level units after marginalization over the lower-level classes (proportional assignment)
mPW: Posterior higher-level class assignment for higher-level units (proportional assignment)
mlogPW: Log of mPW
mPW_N: Posterior higher-level class assignment for lower-level units (proportional assignment)
mPMsumX: Posterior lower-level class assignment for lower-level units after marginalization over the higher-level classes (proportional assignment)
R2entr_low: Lower-level entropy-based R\(^2\)
R2entr_high: Higher-level entropy-based R\(^2\)
BIClow: Lower-level Bayesian Information Criterion (BIC)
BIChigh: Higher-level Bayesian Information Criterion (BIC)
ICL_BIClow: Lower-level BIC-type approximation the integrated complete likelihood
ICL_BIChigh: Higher-level BIC-type approximation the integrated complete likelihood
AIC: Akaike Information Criterion (AIC)
mAlpha: Intercept and slope parameters in logistic models for conditional higher-level class membership
cGamma: Intercept and slope parameters in logistic models for conditional lower-level class membership
mBeta: Intercepts in logistic parametrization for response probabilities
parvec: Vector of logistic parameters
SEs_unc: Uncorrected standard errors
SEs_cor: Corrected standard errors (see Bakk & Kuha, 2018; Di Mari et al., 2023)
SEs_cor_alpha: Corrected standard errors only for the alphas (see Bakk & Kuha, 2018; Di Mari et al., 2023)
SEs_cor_gamma: Corrected standard errors only for the gammas (see Bakk & Kuha, 2018; Di Mari et al., 2023)
mQ: Cross-derivatives for asymptotic standard error correction in two-step estimation (see Bakk & Kuha, 2018; Di Mari et al., 2023)
Varmat_unc: Uncorrected variance-covariance matrix
Varmat_cor: Corrected variance-covariance matrix (see Bakk & Kuha, 2018; Di Mari et al., 2023)
Infomat: Expected information matrix
cAlpha_Info: Expected information matrix only for the alphas
cGamma_Info: Expected information matrix only for the gammas
mV2: Inverse of information matrix for structural model
iter: Number of iterations for EM algorithm
eps: Difference between last two elements of log-likelihood sequence for EM algorithm
LLKSeries: Full log-likelihood series for EM algorithm
vLLK: Current log-likelihood for higher-level units
mScore: Contributions to log-likelihood score
mAlpha_Score: Contributions to log-likelihood score only for the alphas
mGamma_Score: Contributions to log-likelihood score only for the gammas
spec: Model specification
estimator: Estimation approach for structural model
missing_values: Strategy for handling of eventual missing values
sample_size: Final sample size for model estimation

Arguments

data: Input matrix or dataframe.
Y: Names of data columns with indicators.
iT: Number of lower-level latent classes.
id_high: Name of data column with higher-level id. Default: NULL.
iM: Number of higher-level latent classes. Default: NULL.
Z: Names of data columns with lower-level covariates (non-numeric covariates are treated as nominal). Default: NULL.
Zh: Names of data columns with higher-level covariates (non-numeric covariates are treated as nominal). Default: NULL.
incomplete: Whether to estimate the model with missing values included by means of full-information maximum-likelihood estimation (TRUE) or perform row-wise deletion of missing values (FALSE). Default: FALSE.
fixedslopes: Whether to estimate multilevel models with covariates with fixed lower-level slope parameters across the higher-level classes by means of log-linear parametrization. Default: FALSE.
startval: Name of data column with starting values for lower-level latent classes. Default: NULL.
kmea: Whether to compute starting values for single-level model using \(K\)-means (TRUE), which is recommended for algorithmic stability, or \(K\)-modes (FALSE). Default: TRUE.
extout: Whether to output extensive model and estimation information. Default: FALSE.
dataout: Whether to match class predictions to the observed data. Default: TRUE.
sequential: Whether to perform sequential model selection (TRUE) or parallelized model selection (FALSE). Default: TRUE.
numFreeCores: If performing parallelized model selection, the number of CPU cores to keep free. Default: 2.
maxIter: Maximum number of iterations for EM algorithm. Default: 1e3.
tol: Tolerance for EM algorithm. Default: 1e-8.
reord: Whether to (re)order classes in decreasing order according to probability of scoring yes on all items. Default: TRUE.
fixedpars: One-step estimator (0), two-step estimator (1) or two-stage estimator (2). Default: 1.
NRmaxit: Maximum number of iterations for Newton-Raphson algorithm. Default: 100.
NRtol: Tolerance for Newton-Raphson algorithm. Default: 1e-6.
verbose: Whether to print estimation progress. Default: TRUE.

Details

The indicator columns may be coded as as consecutive sequence of integers from 0, or as characters.

To directly estimate a latent class model, iT and (optionally) iM should be specified as a single positive integer. To perform model selection over range of consecutive positive integers as the number of latent classes, iT and/or iM may be specified in the form iT_min:iT_max and/or iM_min:iM_max. It is possible to specify iT = iT_min:iT_max with either iM = NULL or iM equal to a single positive integer, iM = iM_min:iM_max with iT equal to a single positive integer, or iT = iT_min:iT_max with iM = iM_min:iM_max. All model selection procedures return the output of the optimal model based on the BIC.

In the case where both iT and iM are defined as a range of consecutive positive integers, model selection can be performed using the sequential three-stage approach (Lukociene et al., 2010) or a simultaneous approach. The sequential approach involves (first step) estimating iT_min:iT_max single-level models and identifying the optimal alternative iT_opt1 based on the BIC, (second step) estimating iM_min:iM_max|iT = iT_opt1 multilevel models and identifying the optimal alternative iM_opt2 based on the higher-level BIC, and (third step) estimating iT_min:iT_max|iM = iM_opt2 multilevel models and identifying the optimal alternative iT_opt3 based on the lower-level BIC. The simultaneous approach involves devoting multiple CPU cores on the local machine to estimate all combinations in iT = iT_min:iT_max, iM = iM_min:iM_max and identifying the optimal alternative based on the lower-level BIC.

References

Bakk, Z., & Kuha, J. (2018). Two-step estimation of models between latent classes and external variables. Psychometrika, 83, 871-892.

Bakk, Z., Di Mari, R., Oser, J., & Kuha, J. (2022). Two-stage multilevel latent class analysis with covariates in the presence of direct effects. Structural Equation Modeling: A Multidisciplinary Journal, 29(2), 267-277.

Di Mari, Bakk, Z., R., Oser, J., & Kuha, J. (2023). A two-step estimator for multilevel latent class analysis with covariates. Psychometrika.

Lukociene, O., Varriale, R., & Vermunt, J. K. (2010). The simultaneous decision(s) about the number of lower-and higher-level classes in multilevel latent class analysis. Sociological Methodology, 40(1), 247-283.

Examples

Run this code

# \donttest{
# Use the artificial data set
data = dataTOY

# Define vector with names of columns with items
Y = colnames(data)[1+1:10]

# Define name of column with higher-level id
id_high = "id_high"

# Define vector with names of columns with lower-level covariates
Z = c("Z_low")

# Define vector with names of columns with higher-level covariates
Zh = c("Z_high")

# Single-level 3-class LC model with covariates
out = multiLCA(data, Y, 3, Z = Z, verbose = FALSE)
out

# Multilevel LC model
out = multiLCA(data, Y, 3, id_high, 2, verbose = FALSE)
out

# Multilevel LC model lower-level covariates
out = multiLCA(data, Y, 3, id_high, 2, Z, verbose = FALSE)
out

# Multilevel LC model lower- and higher-level covariates
out = multiLCA(data, Y, 3, id_high, 2, Z, Zh, verbose = FALSE)
out

# Model selection over single-level models with 1-3 classes
out = multiLCA(data, Y, 1:3, verbose = FALSE)
out

# Model selection over multilevel models with 1-3 lower-level classes and
# 2 higher-level classes
out = multiLCA(data, Y, 1:3, id_high, 2, verbose = FALSE)
out

# Model selection over multilevel models with 3 lower-level classes and 
# 1-2 higher-level classes
out = multiLCA(data, Y, 3, id_high, 1:2, verbose = FALSE)
out

# Model selection over multilevel models with 1-3 lower-level classes and 
# 1-2 higher-level classes using the default sequential approach
out = multiLCA(data, Y, 1:3, id_high, 1:2, verbose = FALSE)
out
# }

Run the code above in your browser using DataLab