healthcareai (version 2.3.0)

impute: Impute data and return a reusable recipe

Description

impute will impute your data using a variety of methods for both nominal and numeric data. Currently supports mean (numeric only), new_category (categorical only), bagged trees, or knn.

Usage

impute(d = NULL, ..., recipe = NULL, numeric_method = "mean",
  nominal_method = "new_category", numeric_params = NULL,
  nominal_params = NULL, verbose = FALSE)

Arguments

d

A dataframe or tibble containing data to impute.

...

Optional. Unquoted variable names to not be imputed. These will be returned unaltered.

recipe

Optional, a recipe object or an imputed data frame (containing a recipe object as an attribute). If provided, this recipe will be applied to impute new data contained in d with values saved in the recipe. Use this param if you'd like to apply the same values used for imputation on a training dataset in production.

numeric_method

Defaults to "mean". Other choices are "bagimpute" or "knnimpute".

nominal_method

Defaults to "new_category". Other choices are "bagimpute" or "knnimpute".

numeric_params

A named list with parmeters to use with chosen imputation method on numeric data. Options are bag_model (bagimpute only), bag_trees (bagimpute only), bag_options (bagimpute only), bag_trees (bagimpute only), knn_K (knnimpute only), impute_with (knnimpute only), (bag or knn) or seed_val (bag or knn). See step_bagimpute or step_knnimpute for details.

nominal_params

A named list with parmeters to use with chosen imputation method on nominal data. Options are bag_model (bagimpute only), bag_trees (bagimpute only), bag_options (bagimpute only), bag_trees (bagimpute only), knn_K (knnimpute only), impute_with (knnimpute only), (bag or knn) or seed_val (bag or knn). See step_bagimpute or step_knnimpute for details.

verbose

Gives a print out of what will be imputed and which method will be used.

Value

Imputed data frame with reusable recipe object for future imputation in attribute "recipe".

Examples

Run this code
# NOT RUN {
d <- pima_diabetes
d_train <- d[1:700, ]
d_test <- d[701:768, ]
# Train imputer
train_imputed <- impute(d = d_train, patient_id, diabetes)
# Apply to new data
impute(d = d_test, patient_id, diabetes, recipe = train_imputed)
# Specify methods:
impute(d = d_train, patient_id, diabetes, numeric_method = "bagimpute",
nominal_method = "new_category")
# Specify method and param:
impute(d = d_train, patient_id, diabetes, nominal_method = "knnimpute",
nominal_params = list(knn_K = 4))
# }

Run the code above in your browser using DataLab