fmi: Fraction of Missing Information.

Description

This function estimates the Fraction of Missing Information (FMI) for summary statistics of each variable, using either an incomplete data set or a list of imputed data sets.

Usage

fmi(data, method = "saturated", group = NULL, ords = NULL,
  varnames = NULL, exclude = NULL, fewImps = FALSE)

Arguments

data

Either a single data.frame with incomplete observations, or a list of imputed data sets.

method

character. If "saturated" or "sat" (default), the model used to estimate FMI is a freely estimated covariance matrix and mean vector for numeric variables, and/or polychoric correlations and thresholds for ordered categorical variables, for each group (if applicable). If "null", only means and variances are estimated for numeric variables, and/or thresholds for ordered categorical variables (i.e., covariances and/or polychoric correlations are constrained to zero). See Details for more information.

group

character. The optional name of a grouping variable, to request FMI in each group.

ords

character. Optional vector of names of ordered-categorical variables, which are not already stored as class ordered in data.

varnames

character. Optional vector of variable names, to calculate FMI for a subset of variables in data. By default, all numeric and ordered variables will be included, unless data is a single incomplete data.frame, in which case only numeric variables can be used with FIML estimation. Other variable types will be removed.

exclude

character. Optional vector of variable names to exclude from the analysis.

fewImps

logical. If TRUE, use the estimate of FMI that applies a correction to the estimated between-imputation variance. Recommended when there are few imputations; makes little difference when there are many imputations. Ignored when data is not a list of imputed data sets.

Value

fmi returns a list with at least 2 of the following:

Covariances

A list of symmetric matrices: (1) the estimated/pooled covariance matrix, or a list of group-specific matrices (if applicable) and (2) a matrix of FMI, or a list of group-specific matrices (if applicable). Only available if method = "saturated".

Variances

The estimated/pooled variance for each numeric variable. Only available if method = "null" (otherwise, it is on the diagonal of Covariances).

Means

The estimated/pooled mean for each numeric variable.

Thresholds

The estimated/pooled threshold(s) for each ordered-categorical variable.

message

A message indicating caution when the null model is used.

Details

The function estimates a saturated model with lavaan for a single incomplete data set using FIML, or with lavaan.mi for a list of imputed data sets. If method = "saturated", FMI will be estiamted for all summary statistics, which could take a lot of time with big data sets. If method = "null", FMI will only be estimated for univariate statistics (e.g., means, variances, thresholds). The saturated model gives more reliable estimates, so it could also help to request a subset of variables from a large data set.

References

Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York, NY: Wiley.

Savalei, V. & Rhemtulla, M. (2012). On obtaining estimates of the fraction of missing information from full information maximum likelihood. Structural Equation Modeling, 19(3), 477--494. doi:10.1080/10705511.2012.687669

Wagner, J. (2010). The fraction of missing information as a tool for monitoring the quality of survey data. Public Opinion Quarterly, 74(2), 223--243. doi:10.1093/poq/nfq007

Examples

Run this code

# NOT RUN {
HSMiss <- HolzingerSwineford1939[ , c(paste("x", 1:9, sep = ""),
                                      "ageyr","agemo","school")]
set.seed(12345)
HSMiss$x5 <- ifelse(HSMiss$x5 <= quantile(HSMiss$x5, .3), NA, HSMiss$x5)
age <- HSMiss$ageyr + HSMiss$agemo/12
HSMiss$x9 <- ifelse(age <= quantile(age, .3), NA, HSMiss$x9)

## calculate FMI (using FIML, provide partially observed data set)
(out1 <- fmi(HSMiss, exclude = "school"))
(out2 <- fmi(HSMiss, exclude = "school", method = "null"))
(out3 <- fmi(HSMiss, varnames = c("x5","x6","x7","x8","x9")))
(out4 <- fmi(HSMiss, group = "school"))

# }
# NOT RUN {
## ordered-categorical data
data(datCat)
lapply(datCat, class)
## impose missing values
set.seed(123)
for (i in 1:8) datCat[sample(1:nrow(datCat), size = .1*nrow(datCat)), i] <- NA
## impute data m = 3 times
library(Amelia)
set.seed(456)
impout <- amelia(datCat, m = 3, noms = "g", ords = paste0("u", 1:8), p2s = FALSE)
imps <- impout$imputations
## calculate FMI, using list of imputed data sets
fmi(imps, group = "g")
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab