
Last chance! 50% off unlimited learning
Sale ends in
step_modeimpute
creates a specification of a recipe step that
will substitute missing values of nominal variables by the training set
mode of those variables.
step_modeimpute(recipe, ..., role = NA, trained = FALSE, modes = NULL)
A recipe object. The step will be added to the sequence of operations for this recipe.
One or more selector functions to choose which variables are
affected by the step. See selections
for more details.
Not used by this step since no new variables are created.
A logical to indicate if the quantities for preprocessing have been estimated.
A named character vector of modes. This is NULL
until
computed by prep.recipe
.
An updated version of recipe
with the
new step added to the sequence of existing steps (if any).
step_modeimpute
estimates the variable modes from the data
used in the training
argument of prep.recipe
.
bake.recipe
then applies the new values to new data sets using
these values. If the training set data has more than one mode, one is
selected at random.
# NOT RUN {
data("credit_data")
## missing data per column
vapply(credit_data, function(x) mean(is.na(x)), c(num = 0))
set.seed(342)
in_training <- sample(1:nrow(credit_data), 2000)
credit_tr <- credit_data[ in_training, ]
credit_te <- credit_data[-in_training, ]
missing_examples <- c(14, 394, 565)
rec <- recipe(Price ~ ., data = credit_tr)
impute_rec <- rec %>%
step_modeimpute(Status, Home, Marital)
imp_models <- prep(impute_rec, training = credit_tr)
imputed_te <- bake(imp_models, newdata = credit_te, everything())
table(credit_te$Home, imputed_te$Home, useNA = "always")
# }
Run the code above in your browser using DataLab