```
missForest(xmis, maxiter = 10, ntree = 100, variablewise = FALSE,
decreasing = FALSE, verbose = FALSE,
mtry = floor(sqrt(ncol(xmis))), replace = TRUE,
classwt = NULL, cutoff = NULL, strata = NULL,
sampsize = NULL, nodesize = NULL, maxnodes = NULL,
xtrue = NA, parallelize = c('no', 'variables', 'forests'))
```

xmis

a data matrix with missing values. The columns correspond to the
variables and the rows to the observations.

maxiter

maximum number of iterations to be performed given the stopping criterion
is not met beforehand.

ntree

number of trees to grow in each forest.

variablewise

logical. If 'TRUE' the OOB error is returned for each variable
separately. This can be useful as a reliability check for the
imputed variables w.r.t. to a subsequent data analysis.

decreasing

logical. If 'FALSE' then the variables are sorted w.r.t. increasing
amount of missing entries during computation.

verbose

logical. If 'TRUE' the user is supplied with additional output between
iterations, i.e., estimated imputation error, runtime and if complete
data matrix is supplied the true imputation error. See 'xtrue'.

mtry

number of variables randomly sampled at each split. This argument is
directly supplied to the 'randomForest' function. Note that the
default value is sqrt(p) for both categorical and continuous
variables where p is the number of variables i

replace

logical. If 'TRUE' bootstrap sampling (with replacements) is
performed else subsampling (without replacements).

classwt

list of priors of the classes in the categorical variables. This is
equivalent to the randomForest argument, however, the user has to
set the priors for all categorical variables in the data set (for
continuous variables set it 'NULL').

cutoff

list of class cutoffs for each categorical variable. Same as with
'classwt' (for continuous variables set it '1').

strata

list of (factor) variables used for stratified sampling. Same as
with 'classwt' (for continuous variables set it 'NULL').

sampsize

list of size(s) of sample to draw. This is equivalent to the
randomForest argument, however, the user has to set the sizes for
all variables.

nodesize

minimum size of terminal nodes. Has to be a vector of length 2, with
the first entry being the number for continuous variables and the
second entry the number for categorical variables. Default is 1 for
continuous and 5 for categorical vari

maxnodes

maximum number of terminal nodes for trees in the forest.

xtrue

optional. Complete data matrix. This can be supplied to test the
performance. Upon providing the complete data matrix 'verbose' will
show the true imputation error after each iteration and the output
will also contain the final true imputat

parallelize

should 'missForest' be run parallel. Default is 'no'. If 'variables'
the data is split into pieces of the size equal to the number of cores
registered in the parallel backend. If 'forests' the total number of trees in
each random forests i

ximp imputed data matrix of same type as 'xmis'. OOBerror estimated OOB imputation error. For the set of continuous variables in 'xmis' the NRMSE and for the set of categorical variables the proportion of falsely classified entries is returned. See Details for the exact definition of these error measures. If 'variablewise' is set to 'TRUE' then this will be a vector of length 'p' where 'p' is the number of variables and the entries will be the OOB error for each variable separately. error true imputation error. This is only available if 'xtrue' was supplied. The error measures are the same as for 'OOBerror'.

`mixError`

, `prodNA`

, `randomForest`

## Nonparametric missing value imputation on mixed-type data: data(iris) summary(iris) ## The data contains four continuous and one categorical variable. ## Artificially produce missing values using the 'prodNA' function: set.seed(81) iris.mis <- prodNA(iris, noNA = 0.2) summary(iris.mis) ## Impute missing values providing the complete matrix for ## illustration. Use 'verbose' to see what happens between iterations: iris.imp <- missForest(iris.mis, xtrue = iris, verbose = TRUE) ## The imputation is finished after five iterations having a final ## true NRMSE of 0.143 and a PFC of 0.036. The estimated final NRMSE ## is 0.157 and the PFC is 0.025 (see Details for the reason taking ## iteration 4 instead of iteration 5 as final value). ## The final results can be accessed directly. The estimated error: iris.imp$OOBerror ## The true imputation error (if available): iris.imp$error ## And of course the imputed data matrix (do not run this): ## iris.imp$Ximp