poLCA: Latent class analysis of polytomous outcome variables

Description

Estimates latent class and latent class regression models for polytomous outcome variables.

Usage

poLCA(formula, data, nclass = 2, maxiter = 1000, graphs = FALSE, 
      tol = 1e-10, na.rm = TRUE, probs.start = NULL)

Arguments

formula

A formula expression of the form response ~ predictors. The details of model specification are given below.

data

A data frame containing variables in formula. Manifest variables must contain only integer values, and must be coded with consecutive values from 1 to the maximum number of outcomes for each variable. All missing values should be ent

nclass

The number of latent classes to assume in the model. Setting nclass=1 results in poLCA estimating the loglinear independence model. The default is two.

maxiter

The maximum number of iterations through which the estimation algorithm will cycle.

graphs

Logical, for whether poLCA should graphically display the parameter estimates at each stage of the updating algorithm. The default is FALSE.

tol

A tolerance value for judging when convergence has been reached. When the one-iteration change in the estimated log-likelihood is less than tol, the estimation algorithm stops updating and considers the maximum log-likelihood to have been fo

na.rm

Logical, for how poLCA handles cases with missing values on the manifest variables. If TRUE, those cases are removed (listwise deleted) before estimating the model. If FALSE, cases with missing values are retained.

probs.start

A list of matrices of class-conditional response probabilities to be used as the starting values for the estimation algorithm. Each matrix in the list corresponds to one manifest variable, with one row for each latent class, and one column for each outco

Value

poLCA returns a list containing the following components:
ydata frame of manifest variables.
xdata frame of covariates, if specified.
Nnumber of cases used in model.
Nobsnumber of fully observed cases (less than or equal to N).
probsestimated class-conditional response probabilities.
probs.sestandard errors of estimated class-conditional response probabilities, in the same format as probs.
Psizes of each latent class; equal to the mixing proportions in the basic latent class model, or the mean of the priors in the latent class regression model.
P.sethe standard errors of the estimated P.
posteriormatrix of posterior class membership probabilities.
predclassvector of predicted class memberships, by modal assignment.
predcelltable of observed vs. predicted cell counts.
llikmaximum value of the log-likelihood.
numiternumber of iterations until reaching convergence.
coeffmultinomial logit coefficient estimates on covariates (when estimated). coeff is a matrix with nclass-1 columns, and one row for each covariate. All logit coefficients are calculated for classes with respect to class 1.
coeff.sestandard errors of coefficient estimates on covariates (when estimated), in the same format as coeff.
coeff.Vcovariance matrix of coefficient estimates on covariates (when estimated).
aicAkaike Information Criterion.
bicBayesian Information Criterion.
GsqLikelihood ratio/deviance statistic.
ChisqPearson Chi-square goodness of fit statistic for fitted vs. observed multiway tables.
timelength of time it took to run the model.
nparnumber of degrees of freedom used by the model (estimated parameters).
resid.dfnumber of residual degrees of freedom.
eflagLogical, error flag. True if estimation algorithm needed to automatically restart with new initial parameters.
probs.startA list of matrices containing the class-conditional response probabilities used as starting values in the estimation algorithm. If the algorithm needed to restart (see eflag), then this contains the starting values used for the final, successful, run.

Details

Latent class analysis, also known as latent structure analysis, is a technique for the analysis of clustering among observations in multi-way tables of qualitative/categorical variables. The central idea is to fit a model in which any confounding between the manifest variables can be explained by a single unobserved "latent" categorical variable. poLCA uses the assumption of local independence to estimate a mixture model of latent multi-way tables, the number of which (nclass) is specified by the user. Estimated parameters include the class-conditional response probabilities for each manifest variable, the "mixing" proportions denoting population share of observations corresponding to each latent multi-way table, and coefficients on any class-predictor covariates, if specified in the model. Model specification: Latent class models have more than one manifest variable, so the response variables are cbind(dv1,dv2,dv3...) where dv# refer to variable names in the data frame. For models with no covariates, the formula is cbind(dv1,dv2,dv3)~1. For models with covariates, replace the ~1 with the desired function of predictors iv1,iv2,iv3... as, for example, cbind(dv1,dv2,dv3)~iv1+iv2*iv3. poLCA treats all manifest variables as qualitative/categorical/nominal -- NOT as ordinal.

References

Agresti, Alan. 2002. Categorical Data Analysis, second edition. Hoboken: John Wiley & Sons. Bandeen-Roche, Karen, Diana L. Miglioretti, Scott L. Zeger, and Paul J. Rathouz. 1997. "Latent Variable Regression for Multiple Discrete Outcomes." Journal of the American Statistical Association. 92(440): 1375-1386. Hagenaars, Jacques A. and Allan L. McCutcheon, eds. 2002. Applied Latent Class Analysis. Cambridge: Cambridge University Press. McLachlan, Geoffrey J. and Thriyambakam Krishnan. 1997. The EM Algorithm and Extensions. New York: John Wiley & Sons.

Examples

Run this code

##
## Three models without covariates:
## M0: Loglinear independence model.
## M1: Two-class latent class model.
## M2: Three-class latent class model.
##
data(values)
f <- cbind(A,B,C,D)~1
M0 <- poLCA(f,values,nclass=1)			# log-likelihood: -543.6498
M1 <- poLCA(f,values,nclass=2)			# log-likelihood: -504.4677
M2 <- poLCA(f,values,nclass=3,maxiter=8000)	# log-likelihood: -503.3011

##
## Three-class model with a single covariate.
##
data(election)
f2a <- cbind(MORALG,CARESG,KNOWG,LEADG,DISHONG,INTELG,MORALB,CARESB,KNOWB,LEADB,DISHONB,INTELB)~PARTY
nes2a <- poLCA(f2a,election,nclass=3)		# log-likelihood: -16222.32 
pidmat <- cbind(1,c(1:7))
exb <- exp(pidmat %*% nes2a$coeff)
matplot(c(1:7),(cbind(1,exb)/(1+rowSums(exb))),ylim=c(0,1),type="l",
	main="Party ID as a predictor of candidate affinity class",
	xlab="Party ID: strong Democratic (1) to strong Republican (7)",
	ylab="Probability of latent class membership")
text(5.9,0.35,"Other")
text(5.4,0.7,"Bush affinity")
text(1.8,0.6,"Gore affinity")

##
## Create a sample dataset with missing values and estimate the
## latent class model including and excluding the missing values.
## Then plot the estimated class-conditional outcome response 
## probabilities against each other for the two methods.
##
simdat3 <- poLCA.simdata(N=5000,niv=1,ndv=5,nclass=2,b=c(-1,2),missval=TRUE,pctmiss=0.2)
f3 <- cbind(Y1,Y2,Y3,Y4,Y5)~X1
lc3.miss <- poLCA(f3,simdat3$dat,nclass=2)
lc3.nomiss <- poLCA(f3,simdat3$dat,nclass=2,na.rm=FALSE)
ifelse((order(lc3.miss$P)==order(lc3.nomiss$P)),o <- c(1,2),o <- c(2,1))
plot(lc3.miss$probs[[1]],lc3.nomiss$probs[[1]][o,],xlim=c(0,1),ylim=c(0,1),
     xlab="Conditional response probabilities (missing values dropped)",
     ylab="Conditional response probabilities (missing values included)")
for (i in 2:5) { points(lc3.miss$probs[[i]],lc3.nomiss$probs[[i]][o,]) }
abline(0,1,lty=3)

Run the code above in your browser using DataLab