missForest(xmis, maxiter = 10, ntree = 100, variablewise = FALSE,
decreasing = FALSE, verbose = FALSE,
mtry = floor(sqrt(ncol(xmis))), replace = TRUE,
classwt = NULL, cutoff = NULL, strata = NULL,
sampsize = NULL, nodesize = NULL, maxnodes = NULL,
xtrue = NA, parallelize = c('no', 'variables', 'forests'))
a data matrix with missing values. The columns correspond to the
variables and the rows to the observations.
maximum number of iterations to be performed given the stopping criterion
is not met beforehand.
number of trees to grow in each forest.
logical. If 'TRUE' the OOB error is returned for each variable
separately. This can be useful as a reliability check for the
imputed variables w.r.t. to a subsequent data analysis.
logical. If 'FALSE' then the variables are sorted w.r.t. increasing
amount of missing entries during computation.
logical. If 'TRUE' the user is supplied with additional output between
iterations, i.e., estimated imputation error, runtime and if complete
data matrix is supplied the true imputation error. See 'xtrue'.
number of variables randomly sampled at each split. This argument is
directly supplied to the 'randomForest' function. Note that the
default value is sqrt(p) for both categorical and continuous
variables where p is the number of variables i
logical. If 'TRUE' bootstrap sampling (with replacements) is
performed else subsampling (without replacements).
list of priors of the classes in the categorical variables. This is
equivalent to the randomForest argument, however, the user has to
set the priors for all categorical variables in the data set (for
continuous variables set it 'NULL').
list of class cutoffs for each categorical variable. Same as with
'classwt' (for continuous variables set it '1').
list of (factor) variables used for stratified sampling. Same as
with 'classwt' (for continuous variables set it 'NULL').
list of size(s) of sample to draw. This is equivalent to the
randomForest argument, however, the user has to set the sizes for
minimum size of terminal nodes. Has to be a vector of length 2, with
the first entry being the number for continuous variables and the
second entry the number for categorical variables. Default is 1 for
continuous and 5 for categorical vari
maximum number of terminal nodes for trees in the forest.
optional. Complete data matrix. This can be supplied to test the
performance. Upon providing the complete data matrix 'verbose' will
show the true imputation error after each iteration and the output
will also contain the final true imputat
should 'missForest' be run parallel. Default is 'no'. If 'variables'
the data is split into pieces of the size equal to the number of cores
registered in the parallel backend. If 'forests' the total number of trees in
each random forests i