healthcareai (version 2.3.0)

step_missing: Clean NA values from categorical/nominal variables

Description

step_missing creates a specification of a recipe that will replace NA values with a new factor level, missing.

Usage

step_missing(recipe, ..., role = NA, trained = FALSE,
  na_percentage = NULL, skip = FALSE, id = rand_id("bagimpute"))

# S3 method for step_missing tidy(x, ...)

Arguments

recipe

A recipe object. The step will be added to the sequence of operations for this recipe.

...

One or more selector functions to choose which variables are affected by the step. See ?recipes::selections() for more details.

role

Not used by this step since no new variables are created.

trained

A logical to indicate if the number of NA values have been counted in preprocessing.

na_percentage

A named numeric vector of NA percentages. This is NULL until computed by prep.recipe().

skip

A logical. Should the step be skipped when the recipe is baked?

id

a unique step id that will be used to unprep

x

A `step_missing` object.

Value

An updated version of recipe with the new step added to the sequence of existing steps (if any). For the tidy method, a tibble with columns terms (the selectors or variables selected) and value (the NA counts).

Details

NA values are counted when the recipe is trained using prep.recipe. bake.recipe then fills in the missing values for the new data.

Examples

Run this code
# NOT RUN {
library(recipes)
n = 100
d <- tibble::tibble(encounter_id = 1:n,
                    patient_id = sample(1:20, size = n, replace = TRUE),
                    hemoglobin_count = rnorm(n, mean = 15, sd = 1),
                    hemoglobin_category = sample(c("Low", "Normal", "High", NA),
                                                 size = n, replace = TRUE),
                    disease = ifelse(hemoglobin_count < 15, "Yes", "No")
)

# Initialize
my_recipe <- recipe(disease ~ ., data = d)

# Create recipe
my_recipe <- my_recipe %>%
  step_missing(all_nominal())
my_recipe

# Train recipe
trained_recipe <- prep(my_recipe, training = d)

# Apply recipe
data_modified <- bake(trained_recipe, new_data = d)
# }

Run the code above in your browser using DataCamp Workspace