
This function estimates the Fraction of Missing Information (FMI) for summary statistics of each variable, using either an incomplete data set or a list of imputed data sets.
fmi(data, method = "saturated", group = NULL, ords = NULL,
varnames = NULL, exclude = NULL, fewImps = FALSE)
Either a single data.frame
with incomplete observations,
or a list
of imputed data sets.
character. If "saturated"
or "sat"
(default),
the model used to estimate FMI is a freely estimated covariance matrix and
mean vector for numeric variables, and/or polychoric correlations and
thresholds for ordered categorical variables, for each group (if
applicable). If "null"
, only means and variances are estimated for
numeric variables, and/or thresholds for ordered categorical variables
(i.e., covariances and/or polychoric correlations are constrained to zero).
See Details for more information.
character. The optional name of a grouping variable, to request FMI in each group.
character. Optional vector of names of ordered-categorical
variables, which are not already stored as class ordered
in
data
.
character. Optional vector of variable names, to calculate
FMI for a subset of variables in data
. By default, all numeric and
ordered variables will be included, unless data
is a single
incomplete data.frame
, in which case only numeric variables can be
used with FIML estimation. Other variable types will be removed.
character. Optional vector of variable names to exclude from the analysis.
logical. If TRUE
, use the estimate of FMI that applies
a correction to the estimated between-imputation variance. Recommended when
there are few imputations; makes little difference when there are many
imputations. Ignored when data
is not a list of imputed data sets.
fmi
returns a list with at least 2 of the following:
A list of symmetric matrices: (1) the estimated/pooled
covariance matrix, or a list of group-specific matrices (if applicable) and
(2) a matrix of FMI, or a list of group-specific matrices (if applicable).
Only available if method = "saturated"
.
The
estimated/pooled variance for each numeric variable. Only available if
method = "null"
(otherwise, it is on the diagonal of Covariances).
The estimated/pooled mean for each numeric variable.
The estimated/pooled threshold(s) for each ordered-categorical variable.
A message indicating caution when the null model is used.
The function estimates a saturated model with lavaan
for a single incomplete data set using FIML, or with lavaan.mi
for a list of imputed data sets. If method = "saturated"
, FMI will be
estiamted for all summary statistics, which could take a lot of time with
big data sets. If method = "null"
, FMI will only be estimated for
univariate statistics (e.g., means, variances, thresholds). The saturated
model gives more reliable estimates, so it could also help to request a
subset of variables from a large data set.
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York, NY: Wiley.
Savalei, V. & Rhemtulla, M. (2012). On obtaining estimates of the fraction of missing information from full information maximum likelihood. Structural Equation Modeling, 19(3), 477--494. 10.1080/10705511.2012.687669
Wagner, J. (2010). The fraction of missing information as a tool for monitoring the quality of survey data. Public Opinion Quarterly, 74(2), 223--243. 10.1093/poq/nfq007
# NOT RUN {
HSMiss <- HolzingerSwineford1939[ , c(paste("x", 1:9, sep = ""),
"ageyr","agemo","school")]
set.seed(12345)
HSMiss$x5 <- ifelse(HSMiss$x5 <= quantile(HSMiss$x5, .3), NA, HSMiss$x5)
age <- HSMiss$ageyr + HSMiss$agemo/12
HSMiss$x9 <- ifelse(age <= quantile(age, .3), NA, HSMiss$x9)
## calculate FMI (using FIML, provide partially observed data set)
(out1 <- fmi(HSMiss, exclude = "school"))
(out2 <- fmi(HSMiss, exclude = "school", method = "null"))
(out3 <- fmi(HSMiss, varnames = c("x5","x6","x7","x8","x9")))
(out4 <- fmi(HSMiss, group = "school"))
# }
# NOT RUN {
## ordered-categorical data
data(datCat)
lapply(datCat, class)
## impose missing values
set.seed(123)
for (i in 1:8) datCat[sample(1:nrow(datCat), size = .1*nrow(datCat)), i] <- NA
## impute data m = 3 times
library(Amelia)
set.seed(456)
impout <- amelia(datCat, m = 3, noms = "g", ords = paste0("u", 1:8), p2s = FALSE)
imps <- impout$imputations
## calculate FMI, using list of imputed data sets
fmi(imps, group = "g")
# }
# NOT RUN {
# }
Run the code above in your browser using DataLab