poolSat: Fit a Saturated `lavaan` Model to Multiple Imputed Data Sets

Description

This function fits a saturated model to a list of imputed data sets, and returns a list of pooled summary statistics to treat as data.

Usage

poolSat(
  data,
  ...,
  return.fit = FALSE,
  scale.W = TRUE,
  omit.imps = c("no.conv", "no.se")
)

Value

If return.fit=TRUE, a lavaan.mi object. Otherwise, an object of class lavMoments, which is a list

that contains at least $sample.cov and $sample.nobs, potentially also $sample.mean, $sample.th, $NACOV, and $WLS.V. Also contains $lavOptions that will be passed to lavaan(...).

Arguments

data: A list of imputed data sets, or an object class from which imputed data can be extracted. Recognized classes are lavaan.mi (list of imputations stored in the @DataList slot), amelia (created by the Amelia package), or mids (created by the mice package).
...: Additional arguments passed to lavaan::lavCor() or to lavaan.mi().
return.fit: logical indicating whether to return a lavaan.mi object containing the results of fitting the saturated model to multiple imputed data. Could be useful for diagnostic purposes.
scale.W: logical. If TRUE (default), the within- and between-imputation components will be pooled by scaling the within-imputation component by the ARIV (see Enders, 2010, p. 235, for definition and formula). Otherwise, the pooled matrix is calculated as the weighted sum of the within-imputation and between-imputation components (see Enders, 2010, ch. 8, for details).
omit.imps: character vector specifying criteria for omitting imputations from pooled results of saturated model. Can include any of c("no.conv", "no.se", "no.npd"), the first 2 of which are the default setting, which excludes any imputations that did not converge or for which standard errors could not be computed. The last option ("no.npd") would exclude any imputations which yielded a nonpositive definite covariance matrix for observed or latent variables, which would include any "improper solutions" such as Heywood cases. NPD solutions are not excluded by default because they are likely to occur due to sampling error, especially in small samples. However, gross model misspecification could also cause NPD solutions, users can compare pooled results with and without this setting as a sensitivity analysis to see whether some imputations warrant further investigation. Specific imputation numbers can also be included in this argument, in case users want to apply their own custom omission criteria (or simulation studies can use different numbers of imputations without redundantly refitting the model).

Author

Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)

References

Lee, T., & Cai, L. (2012). Alternative multiple imputation inference for mean and covariance structure modeling. Journal of Educational and Behavioral Statistics, 37(6), 675--702. tools:::Rd_expr_doi("10.3102/1076998612458320")

Chung, S., & Cai, L. (2019). Alternative multiple imputation inference for categorical structural equation modeling, Multivariate Behavioral Research, 54(3), 323--337. tools:::Rd_expr_doi("10.1080/00273171.2018.1523000")

Examples

Run this code


data(HS20imps) # import a list of 20 imputed data sets

## fit saturated model to imputations, pool those results
impSubset1 <- lapply(HS20imps, "[", i = paste0("x", 1:9)) # only modeled variables
(prePooledData <- poolSat(impSubset1))

## Note: no means were returned (default lavOption() is meanstructure=FALSE)
(prePooledData <- poolSat(impSubset1, meanstructure = TRUE))

## specify CFA model from lavaan's ?cfa help page
HS.model <- '
  visual  =~ x1 + x2 + x3
  textual =~ x4 + x5 + x6
  speed   =~ x7 + x8 + x9
'

## fit model to summary statistics in "prePooledData"
fit <- cfa(HS.model, data = prePooledData, std.lv = TRUE)
## By default, the "Scaled" column provides a "scaled.shifted" test
## statistic that maintains an approximately nominal Type I error rate.
summary(fit, fit.measures = TRUE, standardized = "std.all")
## Note that this scaled statistic does NOT account for deviations from
## normality, because the default normal-theory standard errors were
## requested when running poolSat().  See below about non-normality.

## Alternatively, "Browne's residual-based (ADF) test" is also available:
lavTest(fit, test = "browne.residual.adf", output = "text")

## Optionally, save the saturated-model lavaan.mi object, which
## could be helpful for diagnosing convergence problems per imputation.
satFit <- poolSat(impSubset1, return.fit = TRUE)


## FITTING MODELS TO DIFFERENT (SUBSETS OF) VARIABLES

## If you only want to analyze a subset of these variables,
mod.vis <- 'visual  =~ x1 + x2 + x3'
## you will get an error:
try(
  fit.vis <- cfa(mod.vis, data = prePooledData) # error
)

## As explained in the "Note" section, you must use poolSat() again for
## this subset of variables
impSubset3 <- lapply(HS20imps, "[", i = paste0("x", 1:3)) # only modeled variables
visData <- poolSat(impSubset3)
fit.vis <- cfa(mod.vis, data = visData) # no problem


## OTHER lavaan OPTIONS

# \donttest{
## fit saturated MULIPLE-GROUP model to imputations
impSubset2 <- lapply(HS20imps, "[", i = c(paste0("x", 1:9), "school"))
(prePooledData2 <- poolSat(impSubset2, group = "school",
                           ## request standard errors that are ROBUST
                           ## to violations of the normality assumption:
                           se = "robust.sem"))
## Nonnormality-robust standard errors are implicitly incorporated into the
## pooled weight matrix (NACOV= argument), so they are
## AUTOMATICALLY applied when fitting the model:
fit.config <- cfa(HS.model, data = prePooledData2, group = "school",
                  std.lv = TRUE)
## standard errors and chi-squared test of fit both robust to nonnormality
summary(fit.config)


## CATEGORICAL OUTCOMES

## discretize the imputed data, for an example of 3-category data
HS3cat <- lapply(impSubset1, function(x) {
  as.data.frame( lapply(x, cut, breaks = 3, labels = FALSE) )
})
## pool polychoric correlations and thresholds
(prePooledData3 <- poolSat(HS3cat, ordered = paste0("x", 1:9)))

fitc <- cfa(HS.model, data = prePooledData3, std.lv = TRUE)
summary(fitc)

## Optionally, use unweighted least-squares estimation.  However,
## you must first REMOVE the pooled weight matrix (WLS.V= argument)
## or replace it with an identity matrix of the same dimensions:
prePooledData4 <- prePooledData3
prePooledData4$WLS.V <- NULL
## or prePooledData4$WLS.V <- diag(nrow(prePooledData3$WLS.V))
fitcu <- cfa(HS.model, data = prePooledData4, std.lv = TRUE, estimator = "ULS")
## Note that the SEs and test were still appropriately corrected:
summary(fitcu)
# }

Run the code above in your browser using DataLab