predict.missRanger: Predict Method

Description

Impute missing values on newdata based on an object of class "missRanger".

For multivariate imputation, use missRanger(..., keep_forests = TRUE). For univariate imputation, no forests are required. This can be enforced by predict(..., iter = 0) or via missRanger(. ~ 1, ...).

Note that out-of-sample imputation works best for rows in newdata with only one missing value (counting only missings in variables used as covariates in random forests). We call this the "easy case". In the "hard case", even multiple iterations (set by iter) can lead to unsatisfactory results.

Usage

# S3 method for missRanger
predict(
  object,
  newdata,
  pmm.k = object$pmm.k,
  iter = 4L,
  num.threads = NULL,
  seed = NULL,
  verbose = 1L,
  ...
)

Arguments

object: 'missRanger' object.
newdata: A data.frame with missing values to impute.
pmm.k: Number of candidate predictions of the original dataset for predictive mean matching (PMM). By default the same value as during fitting.
iter: Number of iterations for "hard case" rows. 0 for univariate imputation.
num.threads: Number of threads used by ranger's predict function. The default NULL uses all threads.
seed: Integer seed used for initial univariate imputation and PMM.
verbose: Should info be printed? (1 = yes/default, 0 for no).
...: Passed to the predict function of ranger.

Details

The out-of-sample algorithm works as follows:

Impute univariately all relevant columns by randomly drawing values from the original unimputed data. This step will only impact "hard case" rows.
Replace univariate imputations by predictions of random forests. This is done sequentially over variables, where the variables are sorted to minimize the impact of univariate imputations. Optionally, this is followed by predictive mean matching (PMM).
Repeat Step 2 for "hard case" rows multiple times.

Examples

Run this code

iris2 <- generateNA(iris, seed = 20, p = c(Sepal.Length = 0.2, Species = 0.1))
imp <- missRanger(iris2, pmm.k = 5, num.trees = 100, keep_forests = TRUE, seed = 2)
predict(imp, head(iris2), seed = 3)

Run the code above in your browser using DataLab