Learn R Programming

workflowsets (version 1.1.0)

workflow_set: Generate a set of workflow objects from preprocessing and model objects

Description

Often a data practitioner needs to consider a large number of possible modeling approaches for a task at hand, especially for new data sets and/or when there is little knowledge about what modeling strategy will work best. Workflow sets provide an expressive interface for investigating multiple models or feature engineering strategies in such a situation.

Usage

workflow_set(preproc, models, cross = TRUE, case_weights = NULL)

Value

A tibble with extra class 'workflow_set'. A new set includes four columns (but others can be added):

  • wflow_id contains character strings for the preprocessor/workflow combination. These can be changed but must be unique.

  • info is a list column with tibbles containing more specific information, including any comments added using comment_add(). This tibble also contains the workflow object (which can be easily retrieved using extract_workflow()).

  • option is a list column that will include a list of optional arguments passed to the functions from the tune package. They can be added manually via option_add() or automatically when options are passed to workflow_map().

  • result is a list column that will contain any objects produced when workflow_map() is used.

Arguments

preproc

A list (preferably named) with preprocessing objects: formulas, recipes, or workflows::workflow_variables().

models

A list (preferably named) of parsnip model specifications.

cross

A logical: should all combinations of the preprocessors and models be used to create the workflows? If FALSE, the length of preproc and models should be equal.

case_weights

A single unquoted column name specifying the case weights for the models. This must be a classed case weights column, as determined by hardhat::is_case_weights(). See the "Case weights" section below for more information.

Case weights

The case_weights argument can be passed as a single unquoted column name identifying the data column giving model case weights. For each workflow in the workflow set using an engine that supports case weights, the case weights will be added with workflows::add_case_weights(). workflow_set() will warn if any of the workflows specify an engine that does not support case weights---and ignore the case weights argument for those workflows---but will not fail.

Read more about case weights in the tidymodels at ?parsnip::case_weights.

Details

The preprocessors that can be combined with the model objects can be one or more of:

Since preproc is a named list column, any combination of these can be used in that argument (i.e., preproc can be mixed types).

See Also

workflow_map(), comment_add(), option_add(), as_workflow_set()

Examples

Run this code
if (FALSE) { # rlang::is_installed(c("kknn", "modeldata", "recipes", "yardstick"))
library(workflowsets)
library(workflows)
library(modeldata)
library(recipes)
library(parsnip)
library(dplyr)
library(rsample)
library(tune)
library(yardstick)

# ------------------------------------------------------------------------------

data(cells)
cells <- cells %>% dplyr::select(-case)

set.seed(1)
val_set <- validation_split(cells)

# ------------------------------------------------------------------------------

basic_recipe <-
  recipe(class ~ ., data = cells) %>%
  step_YeoJohnson(all_predictors()) %>%
  step_normalize(all_predictors())

pca_recipe <-
  basic_recipe %>%
  step_pca(all_predictors(), num_comp = tune())

ss_recipe <-
  basic_recipe %>%
  step_spatialsign(all_predictors())

# ------------------------------------------------------------------------------

knn_mod <-
  nearest_neighbor(neighbors = tune(), weight_func = tune()) %>%
  set_engine("kknn") %>%
  set_mode("classification")

lr_mod <-
  logistic_reg() %>%
  set_engine("glm")

# ------------------------------------------------------------------------------

preproc <- list(none = basic_recipe, pca = pca_recipe, sp_sign = ss_recipe)
models <- list(knn = knn_mod, logistic = lr_mod)

cell_set <- workflow_set(preproc, models, cross = TRUE)
cell_set

# ------------------------------------------------------------------------------
# Using variables and formulas

# Select predictors by their names
channels <- paste0("ch_", 1:4)
preproc <- purrr::map(channels, ~ workflow_variables(class, c(contains(!!.x))))
names(preproc) <- channels
preproc$everything <- class ~ .
preproc

cell_set_by_group <- workflow_set(preproc, models["logistic"])
cell_set_by_group
}

Run the code above in your browser using DataLab