Using Dials with Parsnip

knitr::opts_chunk$set( message = FALSE, digits = 3, collapse = TRUE, comment = "#>" ) options(digits = 3) library(dials) library(parsnip)

parsnip is a package in development that provides more unified interfaces to model functions. It has functions to create a model specification that can be used to fit a particular model using different R packages (or by other means). these model specifications have main arguments for important tuning parameters. For example, a minimal model specification is:

library(parsnip) boost_tree(mode = "regression")

This particular model has a number of different arguments for tuning parameters:

str(boost_tree)

If we know exactly what specific value of a parameter should be, it can be specified:

boost_tree(mode = "regression", trees = 50, min_n = 5, sample_size = 3/4)

Note that :

  • These parameter names have identically named parameter objects in dials.
  • Like other parsnip functions, boost_tree can use different R packages to fit this model, in this case, xgboost and C50. Not all parameters to boost_tree are relevant to each of these specific models.
  • Any parameters not specified in this call will use their model-specific defaults.

What happens if you know that you want to optimize the value of a parameter but don't know what the value will be? In this case, the parsnip function varying() can be used as a placeholder. For example, min_n is conditional on the sample size of the training set, so we may not know a feasible value until be have the exact training or analysis set:

mod_obj <- boost_tree( mode = "regression", trees = varying(), min_n = varying(), sample_size = varying() ) mod_obj

If some type of grid search is used, there is a simple function in dials can be used to update this parameter specification with candidate values. Let's create a small, random grid for these parameters as if we were going to model the mtcars data set. We will set the ranges for these parameters in-line when creating the grid:

library(tidymodels) library(dials) mtcars_pred <- mtcars %>% select(-mpg) set.seed(1263) bst_grid <- grid_random( trees, # Has complete default ranges min_n %>% finalize(mtcars_pred), sample_size %>% finalize(mtcars_pred), size = 10 ) bst_grid

We can use the merge function to combine these parameters with the model specification:

bst_grid <- bst_grid %>% mutate(specs = merge(mod_obj, .)) bst_grid[1, ] bst_grid %>% slice(1) %>% pull(specs)