Learn R Programming

randomForestSRC (version 1.3)

impute.rfsrc: Impute Only Mode

Description

Fast imputation mode. A random forest is grown and used to impute missing data. No ensemble estimates or error rates are calculated.

Usage

## S3 method for class 'rfsrc':
impute(formula, data, ntree = 1000, mtry = NULL,
  nodesize = NULL, splitrule = NULL, nsplit = 0, nimpute = 1,
  xvar.wt = NULL, seed = NULL, do.trace = FALSE, ...)

Arguments

formula
A symbolic description of the model to be fit. Can be left unspecified if there are no outcomes or we don't care to distinguish between y-outcomes and x-variables in the imputation.
data
Data frame containing the data to be imputed.
ntree
Number of trees to grow.
mtry
Number of variables randomly sampled at each split.
nodesize
Minimum terminal node size.
splitrule
Splitting rule used to grow trees.
nsplit
Non-negative integer value used to specify random splitting.
nimpute
Number of iterations of missing data algorithm.
xvar.wt
Weights for selecting variables for splitting on.
seed
Seed for random number generator.
do.trace
Should trace output be enabled?
...
Further arguments passed to or from other methods.

Value

  • Invisibly, the data frame containing the orginal data with imputed data overlayed. The first column(s) contain the y outcome values.

Details

Grow a forest and use this to impute data. All external calculations such as ensemble calculations, error rates, etc. are turned off. Use this function if your only interest is imputing the data.

All options are the same as rfsrc and the user should consult the help file for rfsrc for details.

See Also

rfsrc

Examples

Run this code
## ------------------------------------------------------------
## survival example
## ------------------------------------------------------------

#default split rule
data(pbc, package = "randomForestSRC")
pbc.d <- impute.rfsrc(Surv(days, status) ~ ., data = pbc, nsplit = 3)

#random splitting is fast and works well
pbc2.d <- impute.rfsrc(Surv(days, status) ~ ., data = pbc, splitrule = "random")
summary(pbc.d - pbc2.d)

## ------------------------------------------------------------
## regression.example
## ------------------------------------------------------------

air.d <- impute.rfsrc(Ozone ~ ., data = airquality, nimpute = 5)
air2.d <- impute.rfsrc(Ozone ~ ., data = airquality, nimpute = 5, splitrule = "random")

## ------------------------------------------------------------
## unsupervised example
## we impute without distinction between y-outcomes and x-variables
## we are invoking random splitting
## all variables are imputed
## ------------------------------------------------------------

data(pbc, package = "randomForestSRC")
pbcR.d <- impute.rfsrc(data = pbc, nimpute = 5)
airR.d <- impute.rfsrc(data = airquality, nimpute = 5)

Run the code above in your browser using DataLab