The pool()
function combines the estimates from m
repeated complete data analyses. The typical sequence of steps to
do a multiple imputation analysis is:
Impute the missing data by the mice
function, resulting in
a multiple imputed data set (class mids
);
Fit the model of interest (scientific model) on each imputed data set
by the with()
function, resulting an object of class mira
;
Pool the estimates from each model into a single set of estimates
and standard errors, resulting is an object of class mipo
;
Optionally, compare pooled estimates from different scientific models
by the D1()
or D3()
functions.
A common error is to reverse steps 2 and 3, i.e., to pool the
multiply-imputed data instead of the estimates. Doing so may severely bias
the estimates of scientific interest and yield incorrect statistical
intervals and p-values. The pool()
function will detect
this case.
pool(object, dfcom = NULL)
An object of class mira
(produced by with.mids()
or as.mira()
), or a list
with model fits.
A positive number representing the degrees of freedom in the
complete-data analysis. Normally, this would be the number of independent
observation minus the number of fitted parameters. The default
(dfcom = NULL
) extract this information in the following
order: 1) the component
residual.df
returned by glance()
if a glance()
function is found, 2) the result of df.residual(
applied to
the first fitted model, and 3) as 999999
.
In the last case, the warning "Large sample assumed"
is printed.
If the degrees of freedom is incorrect, specify the appropriate value
manually.
An object of class mipo
, which stands for 'multiple imputation
pooled outcome'.
The pool()
function averages the estimates of the complete
data model, computes the
total variance over the repeated analyses by Rubin's rules
(Rubin, 1987, p. 76),
and computes the following diagnostic statistics per estimate:
Relative increase in variance due to nonresponse r
;
Residual degrees of freedom for hypothesis testing df
;
Proportion of total variance due to missingness lambda
;
Fraction of missing information fmi
.
The function requires the following input from each fitted model:
the estimates of the model, usually obtainable by coef()
the standard error of each estimate;
the residual degrees of freedom of the model.
The pool()
function relies on the broom::tidy
for
extracting the parameters. Versions before mice 3.8.5
failed
when no broom::glance()
function was found for extracting the
residual degrees of freedom. The pool()
function is now
more forgiving.
The degrees of freedom calculation for the pooled estimates uses the Barnard-Rubin adjustment for small samples (Barnard and Rubin, 1999).
Barnard, J. and Rubin, D.B. (1999). Small sample degrees of freedom with multiple imputation. Biometrika, 86, 948-955.
Rubin, D.B. (1987). Multiple Imputation for Nonresponse in Surveys. New York: John Wiley and Sons.
van Buuren S and Groothuis-Oudshoorn K (2011). mice
: Multivariate
Imputation by Chained Equations in R
. Journal of Statistical
Software, 45(3), 1-67. https://www.jstatsoft.org/v45/i03/
# NOT RUN {
# pool using the classic MICE workflow
imp <- mice(nhanes, maxit = 2, m = 2)
fit <- with(data = imp, exp = lm(bmi ~ hyp + chl))
summary(pool(fit))
# }
Run the code above in your browser using DataCamp Workspace