Learn R Programming

missRanger (version 1.0.0)

missRanger: Missing Values Imputation by Chained Random Forests

Description

Uses the "ranger" package [1] to do fast missing value imputation by chained random forests, see [2] and [3]. Between the iterative model fitting, we offer the option of using predictive mean matching. This firstly avoids the imputation with values not present in the original data (like a value 0.3334 in 0-1 coded variable). Secondly, predictive mean matching tries to raise the variance in the resulting conditional distributions to a realistic level. This would allow e.g. to do multiple imputation when repeating the call to "missRanger".

Usage

missRanger(data, maxiter = 10L, pmm.k = 0L, seed = NULL, ...)

Arguments

data

A data.frame with missing values to impute.

maxiter

Maximum number of chaining iterations.

pmm.k

Number of candidate non-missing values to sample from in the predictive mean matching step. 0 to avoid this step.

seed

Integer seed to initialize the random generator.

...

Arguments passed to ranger. Don't use formula, data or seed. They are already handled by the algorithm. Not all ranger options do make sense (e.g. write.forest = FALSE will cause the algorithm to crash. If the data set is large, better use less trees num.trees = 100 and/or a low value of sample.fraction.

Value

A data.frame as data but with imputed missing values.

References

[1] Wright, M. N. & Ziegler, A. (2016). ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R. Journal of Statistical Software, in press. http://arxiv.org/abs/1508.04409.

[2] Stekhoven, D.J. and Buehlmann, P. (2012). 'MissForest - nonparametric missing value imputation for mixed-type data', Bioinformatics, 28(1) 2012, 112-118, doi: 10.1093/bioinformatics/btr597

[3] Van Buuren, S., Groothuis-Oudshoorn, K. (2011). mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 45(3), 1-67. http://www.jstatsoft.org/v45/i03/

Examples

Run this code
# NOT RUN {
irisWithNA <- generateNA(iris)
irisImputed <- missRanger(irisWithNA, pmm.k = 3, num.trees = 100)
head(irisImputed)
head(irisWithNA)

# With extra trees algorithm
irisImputed_et <- missRanger(irisWithNA, pmm.k = 3, num.trees = 100, splitrule = "extratrees")
head(irisImputed_et)
# }

Run the code above in your browser using DataLab