LCA: Latent Class Analysis

Description

Performs Latent Class Analysis (LCA) on binary response data using the Expectation-Maximization (EM) algorithm. LCA identifies unobserved (latent) subgroups of examinees with similar response patterns, and estimates both the class characteristics and individual membership probabilities.

Usage

LCA(U, ncls = 2, na = NULL, Z = NULL, w = NULL, maxiter = 100, verbose = TRUE)

Value

An object of class "exametrika" and "LCA" containing:

msg: A character string indicating the model type.
testlength: Length of the test (number of items).
nobs: Sample size (number of rows in the dataset).
Nclass: Number of latent classes specified.
N_Cycle: Number of EM algorithm iterations performed.
TRP: Test Reference Profile vector showing expected scores for each latent class. Calculated as the column sum of the estimated class reference matrix.
LCD: Latent Class Distribution vector showing the number of examinees assigned to each latent class.
CMD: Class Membership Distribution vector showing the sum of membership probabilities for each latent class.
Students: Class Membership Profile matrix showing the posterior probability of each examinee belonging to each latent class. The last column ("Estimate") indicates the most likely class assignment.
IRP: Item Reference Profile matrix where each row represents an item and each column represents a latent class. Values indicate the probability of a correct response for members of that class.
ItemFitIndices: Fit indices for each item. See also ItemFit.
TestFitIndices: Overall fit indices for the test. See also TestFit.

Arguments

U: Either an object of class "exametrika" or raw data. When raw data is given, it is converted to the exametrika class with the dataFormat function.
ncls: Number of latent classes to identify (between 2 and 20). Default is 2.
na: Values to be treated as missing values.
Z: Missing indicator matrix of type matrix or data.frame. Values of 1 indicate observed responses, while 0 indicates missing data.
w: Item weight vector specifying the relative importance of each item.
maxiter: Maximum number of EM algorithm iterations. Default is 100.
verbose: Logical; if TRUE, displays progress during estimation. Default is TRUE.

Details

Latent Class Analysis is a statistical method for identifying unobserved subgroups within a population based on observed response patterns. It assumes that examinees belong to one of several distinct latent classes, and that the probability of a correct response to each item depends on class membership.

The algorithm proceeds by:

Initializing class reference probabilities
Computing posterior class membership probabilities for each examinee (E-step)
Re-estimating class reference probabilities based on these memberships (M-step)
Iterating until convergence or reaching the maximum number of iterations

Unlike Item Response Theory (IRT), LCA treats latent variables as categorical rather than continuous, identifying distinct profiles rather than positions on a continuum.

References

Goodman, L. A. (1974). Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika, 61(2), 215-231.

Lazarsfeld, P. F., & Henry, N. W. (1968). Latent structure analysis. Boston: Houghton Mifflin.

Examples

Run this code

# \donttest{
# Fit a Latent Class Analysis model with 5 classes to the sample dataset
result.LCA <- LCA(J15S500, ncls = 5)

# Display the first few rows of student class membership probabilities
head(result.LCA$Students)

# Plot Item Response Profiles (IRP) for items 1-6 in a 2x3 grid
# Shows probability of correct response for each item across classes
plot(result.LCA, type = "IRP", items = 1:6, nc = 2, nr = 3)

# Plot Class Membership Probabilities (CMP) for students 1-9 in a 3x3 grid
# Shows probability distribution of class membership for each student
plot(result.LCA, type = "CMP", students = 1:9, nc = 3, nr = 3)

# Plot Test Response Profile (TRP) showing expected scores for each class
plot(result.LCA, type = "TRP")

# Plot Latent Class Distribution (LCD) showing class sizes
plot(result.LCA, type = "LCD")

# Compare models with different numbers of classes
# (In practice, you might try more class counts)
lca2 <- LCA(J15S500, ncls = 2)
lca3 <- LCA(J15S500, ncls = 3)
lca4 <- LCA(J15S500, ncls = 4)
lca5 <- LCA(J15S500, ncls = 5)

# Compare BIC values to select optimal number of classes
# (Lower BIC indicates better fit)
data.frame(
  Classes = 2:5,
  BIC = c(
    lca2$TestFitIndices$BIC,
    lca3$TestFitIndices$BIC,
    lca4$TestFitIndices$BIC,
    lca5$TestFitIndices$BIC
  )
)
# }

Run the code above in your browser using DataLab