mice.impute.rf: Imputation by random forests

Description

Imputes univariate missing data using random forests.

Usage

mice.impute.rf(y, ry, x, ntree = 10, ...)

Arguments

Numeric vector with incomplete data

Response pattern of y (TRUE = observed, FALSE = missing)

Design matrix with length(y) rows and p columns containing complete covariates.

ntree

The number of trees to grow. The default is 10.

...

Other named arguments passed down to randomForest() and randomForest:::randomForest.default().

Value

Numeric vector of length sum(!ry) with imputations

Details

Imputation of y by random forests. The method calls randomForrest() which implements Breiman's random forest algorithm (based on Breiman and Cutler's original Fortran code) for classification and regression. See Appendix A.1 of Doove et al. (2014) for the definition of the algorithm used. An alternative implementation was independently developed by Shah et al (2014), and is available in the package CALIBERrfimpute. Simulations by Shah (Feb 13, 2014) suggested that the quality of the imputation for 10 and 100 trees was identical, so mice 2.22 changed the default number of trees from ntree = 100 to ntree = 10.

References

Doove, L.L., van Buuren, S., Dusseldorp, E. (2014), Recursive partitioning for missing data imputation in the presence of interaction Effects. Computational Statistics \& Data Analysis, 72, 92-104.

Shah, A.D., Bartlett, J.W., Carpenter, J., Nicholas, O., Hemingway, H. (2014), Comparison of random forest and parametric imputation models for imputing missing data using MICE: A CALIBER study. American Journal of Epidemiology, doi: 10.1093/aje/kwt312.

Van Buuren, S.(2012), Flexible imputation of missing data, Boca Raton, FL: Chapman & Hall/CRC.

Examples

Run this code

library("lattice")

imp <- mice(nhanes2, meth = "rf", ntree = 3)
plot(imp)

Run the code above in your browser using DataLab