
"rfImpute"(x, y, iter=5, ntree=300, ...)
"rfImpute"(x, data, ..., subset)
NA
s, or a formula.NA
's not allowed).randomForest
.NA
s are imputed using proximity from randomForest. The first
column contains the response.
NA
s using
na.roughfix
. Then randomForest
is called
with the completed data. The proximity matrix from the randomForest
is used to update the imputation of the NA
s. For continuous
predictors, the imputed value is the weighted average of the
non-missing obervations, where the weights are the proximities. For
categorical predictors, the imputed value is the category with the
largest average proximity. This process is iterated iter
times.Note: Imputation has not (yet) been implemented for the unsupervised case. Also, Breiman (2003) notes that the OOB estimate of error from randomForest tend to be optimistic when run on the data matrix with imputed values.
na.roughfix
.
data(iris)
iris.na <- iris
set.seed(111)
## artificially drop some data values.
for (i in 1:4) iris.na[sample(150, sample(20)), i] <- NA
set.seed(222)
iris.imputed <- rfImpute(Species ~ ., iris.na)
set.seed(333)
iris.rf <- randomForest(Species ~ ., iris.imputed)
print(iris.rf)
Run the code above in your browser using DataLab