Learn R Programming

parsnip

Introduction

The goal of parsnip is to provide a tidy, unified interface to models that can be used to try a range of models without getting bogged down in the syntactical minutiae of the underlying packages.

Installation

# The easiest way to get parsnip is to install all of tidymodels:
install.packages("tidymodels")

# Alternatively, install just parsnip:
install.packages("parsnip")

# Or the development version from GitHub:
# install.packages("pak")
pak::pak("tidymodels/parsnip")

Getting started

One challenge with different modeling functions available in R that do the same thing is that they can have different interfaces and arguments. For example, to fit a random forest regression model, we might have:

# From randomForest
rf_1 <- randomForest(
  y ~ ., 
  data = dat, 
  mtry = 10, 
  ntree = 2000, 
  importance = TRUE
)

# From ranger
rf_2 <- ranger(
  y ~ ., 
  data = dat, 
  mtry = 10, 
  num.trees = 2000, 
  importance = "impurity"
)

# From sparklyr
rf_3 <- ml_random_forest(
  dat, 
  intercept = FALSE, 
  response = "y", 
  features = names(dat)[names(dat) != "y"], 
  col.sample.rate = 10,
  num.trees = 2000
)

Note that the model syntax can be very different and that the argument names (and formats) are also different. This is a pain if you switch between implementations.

In this example:

  • the type of model is “random forest”,
  • the mode of the model is “regression” (as opposed to classification, etc), and
  • the computational engine is the name of the R package.

The goals of parsnip are to:

  • Separate the definition of a model from its evaluation.
  • Decouple the model specification from the implementation (whether the implementation is in R, spark, or something else). For example, the user would call rand_forest instead of ranger::ranger or other specific packages.
  • Harmonize argument names (e.g. n.trees, ntrees, trees) so that users only need to remember a single name. This will help across model types too so that trees will be the same argument across random forest as well as boosting or bagging.

Using the example above, the parsnip approach would be:

library(parsnip)

rand_forest(mtry = 10, trees = 2000) %>%
  set_engine("ranger", importance = "impurity") %>%
  set_mode("regression")
#> Random Forest Model Specification (regression)
#> 
#> Main Arguments:
#>   mtry = 10
#>   trees = 2000
#> 
#> Engine-Specific Arguments:
#>   importance = impurity
#> 
#> Computational engine: ranger

The engine can be easily changed. To use Spark, the change is straightforward:

rand_forest(mtry = 10, trees = 2000) %>%
  set_engine("spark") %>%
  set_mode("regression")
#> Random Forest Model Specification (regression)
#> 
#> Main Arguments:
#>   mtry = 10
#>   trees = 2000
#> 
#> Computational engine: spark

Either one of these model specifications can be fit in the same way:

set.seed(192)
rand_forest(mtry = 10, trees = 2000) %>%
  set_engine("ranger", importance = "impurity") %>%
  set_mode("regression") %>%
  fit(mpg ~ ., data = mtcars)
#> parsnip model object
#> 
#> Ranger result
#> 
#> Call:
#>  ranger::ranger(x = maybe_data_frame(x), y = y, mtry = min_cols(~10,      x), num.trees = ~2000, importance = ~"impurity", num.threads = 1,      verbose = FALSE, seed = sample.int(10^5, 1)) 
#> 
#> Type:                             Regression 
#> Number of trees:                  2000 
#> Sample size:                      32 
#> Number of independent variables:  10 
#> Mtry:                             10 
#> Target node size:                 5 
#> Variable importance mode:         impurity 
#> Splitrule:                        variance 
#> OOB prediction error (MSE):       5.976917 
#> R squared (OOB):                  0.8354559

A list of all parsnip models across different CRAN packages can be found at https://www.tidymodels.org/find/parsnip.

Contributing

This project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Copy Link

Version

Install

install.packages('parsnip')

Monthly Downloads

49,201

Version

1.1.0

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Max Kuhn

Last Published

April 12th, 2023

Functions in parsnip (1.1.0)

null_value

Functions required for parsnip-adjacent packages
C5_rules

C5.0 rule-based classification models
bag_mars

Ensembles of MARS models
bag_mlp

Ensembles of neural networks
auto_ml

Automatic Machine Learning
bag_tree

Ensembles of decision trees
add_rowindex

Add a column of row numbers to a data frame
C5.0_train

Boosted trees via C5.0
bart

Bayesian additive regression trees (BART)
bart-internal

Developer functions for predictions via BART models
boost_tree

Boosted trees
case_weights

Using case weights with parsnip
control_parsnip

Control the fit function
details_bag_mlp_nnet

Bagged neural networks via nnet
descriptors

Data Set Characteristics Available when Fitting Models
details_bag_tree_C5.0

Bagged trees via C5.0
details_auto_ml_h2o

Automatic machine learning via h2o
details_bag_mars_earth

Bagged MARS via earth
details_C5_rules_C5.0

C5.0 rule-based classification models
censoring_weights

Calculations for inverse probability of censoring weights (IPCW)
.convert_form_to_xy_fit

Helper functions to convert between formula and matrix interface
check_empty_ellipse

Check to ensure that ellipses are empty
cubist_rules

Cubist rule-based regression models
ctree_train

A wrapper function for conditional inference tree models
decision_tree

Decision trees
convert_stan_interval

Convenience function for intervals
condense_control

Condense control object into strictly smaller control object
contr_one_hot

Contrast function for one-hot encodings
details_boost_tree_xgboost

Boosted trees via xgboost
details_boost_tree_spark

Boosted trees via Spark
details_cubist_rules_Cubist

Cubist rule-based regression models
details_decision_tree_C5.0

Decision trees via C5.0
details_boost_tree_mboost

Boosted trees
details_boost_tree_lightgbm

Boosted trees via lightgbm
details_boost_tree_C5.0

Boosted trees via C5.0
details_boost_tree_h2o

Boosted trees via h2o
details_discrim_linear_mda

Linear discriminant analysis via flexible discriminant analysis
details_discrim_linear_sda

Linear discriminant analysis via James-Stein-type shrinkage estimation
details_decision_tree_partykit

Decision trees via partykit
details_discrim_linear_MASS

Linear discriminant analysis via MASS
details_bag_tree_rpart

Bagged trees via rpart
details_bart_dbarts

Bayesian additive regression trees via dbarts
details_decision_tree_spark

Decision trees via Spark
details_discrim_flexible_earth

Flexible discriminant analysis via earth
details_decision_tree_rpart

Decision trees via CART
details_discrim_quad_MASS

Quadratic discriminant analysis via MASS
details_discrim_quad_sparsediscrim

Quadratic discriminant analysis via regularization
details_linear_reg_brulee

Linear regression via brulee
details_linear_reg_glm

Linear regression via glm
details_linear_reg_glmnet

Linear regression via glmnet
details_linear_reg_gee

Linear regression via generalized estimating equations (GEE)
details_linear_reg_keras

Linear regression via keras/tensorflow
details_discrim_linear_sparsediscrim

Linear discriminant analysis via regularization
details_linear_reg_gls

Linear regression via generalized least squares
details_linear_reg_glmer

Linear regression via generalized mixed models
details_gen_additive_mod_mgcv

Generalized additive models via mgcv
details_discrim_regularized_klaR

Regularized discriminant analysis via klaR
details_linear_reg_h2o

Linear regression via h2o
details_logistic_reg_LiblineaR

Logistic regression via LiblineaR
details_linear_reg_lm

Linear regression via lm
details_linear_reg_stan

Linear regression via Bayesian Methods
details_logistic_reg_brulee

Logistic regression via brulee
details_logistic_reg_gee

Logistic regression via generalized estimating equations (GEE)
details_linear_reg_lmer

Linear regression via mixed models
details_linear_reg_stan_glmer

Linear regression via hierarchical Bayesian methods
details_linear_reg_lme

Linear regression via mixed models
details_linear_reg_spark

Linear regression via spark
details_logistic_reg_glm

Logistic regression via glm
details_mlp_brulee

Multilayer perceptron via brulee
details_logistic_reg_glmnet

Logistic regression via glmnet
details_mlp_h2o

Multilayer perceptron via h2o
details_logistic_reg_glmer

Logistic regression via mixed models
details_logistic_reg_keras

Logistic regression via keras
details_logistic_reg_h2o

Logistic regression via h2o
details_mars_earth

Multivariate adaptive regression splines (MARS) via earth
details_logistic_reg_stan_glmer

Logistic regression via hierarchical Bayesian methods
details_multinom_reg_h2o

Multinomial regression via h2o
details_multinom_reg_nnet

Multinomial regression via nnet
details_naive_Bayes_klaR

Naive Bayes models via klaR
details_logistic_reg_spark

Logistic regression via spark
details_logistic_reg_stan

Logistic regression via stan
details_naive_Bayes_h2o

Naive Bayes models via naivebayes
details_multinom_reg_keras

Multinomial regression via keras
details_multinom_reg_brulee

Multinomial regression via brulee
details_mlp_keras

Multilayer perceptron via keras
details_multinom_reg_glmnet

Multinomial regression via glmnet
details_multinom_reg_spark

Multinomial regression via spark
details_mlp_nnet

Multilayer perceptron via nnet
details_naive_Bayes_naivebayes

Naive Bayes models via naivebayes
details_poisson_reg_h2o

Poisson regression via h2o
details_poisson_reg_glmnet

Poisson regression via glmnet
details_pls_mixOmics

Partial least squares via mixOmics
details_poisson_reg_gee

Poisson regression via generalized estimating equations (GEE)
details_poisson_reg_hurdle

Poisson regression via pscl
details_poisson_reg_stan

Poisson regression via stan
details_nearest_neighbor_kknn

K-nearest neighbors via kknn
details_proportional_hazards_survival

Proportional hazards regression
details_rand_forest_h2o

Random forests via h2o
details_rand_forest_randomForest

Random forests via randomForest
details_rand_forest_ranger

Random forests via ranger
details_rand_forest_spark

Random forests via spark
details_rand_forest_partykit

Random forests via partykit
details_poisson_reg_glm

Poisson regression via glm
details_rand_forest_aorsf

Oblique random survival forests via aorsf
details_poisson_reg_glmer

Poisson regression via mixed models
details_proportional_hazards_glmnet

Proportional hazards regression
details_poisson_reg_zeroinfl

Poisson regression via pscl
details_poisson_reg_stan_glmer

Poisson regression via hierarchical Bayesian methods
details_rule_fit_h2o

RuleFit models via h2o
details_surv_reg_flexsurv

Parametric survival regression
details_svm_linear_LiblineaR

Linear support vector machines (SVMs) via LiblineaR
details_svm_poly_kernlab

Polynomial support vector machines (SVMs) via kernlab
details_survival_reg_survival

Parametric survival regression
details_rule_fit_xrf

RuleFit models via xrf
details_svm_linear_kernlab

Linear support vector machines (SVMs) via kernlab
discrim_regularized

Regularized discriminant analysis
discrim_linear

Linear discriminant analysis
details_survival_reg_flexsurvspline

Flexible parametric survival regression
details_survival_reg_flexsurv

Parametric survival regression
.model_param_name_key

Translate names of model tuning parameters
discrim_flexible

Flexible discriminant analysis
discrim_quad

Quadratic discriminant analysis
doc-tools

Tools for documenting engines
details_svm_rbf_kernlab

Radial basis function support vector machines (SVMs) via kernlab
details_surv_reg_survival

Parametric survival regression
eval_args

Evaluate parsnip model arguments
fit.model_spec

Fit a Model Specification to a Dataset
extract-parsnip

Extract elements of a parsnip model object
.check_glmnet_penalty_fit

Helper functions for checking the penalty of glmnet models
spec_is_possible

Model Specification Checking:
glm_grouped

Fit a grouped binomial outcome from a data set with case weights
glmnet-details

Technical aspects of the glmnet model
glance.model_fit

Construct a single row summary "glance" of a model, fit, or other object
get_model_env

Working with the parsnip model environment
keras_predict_classes

Wrapper for keras class predictions
linear_reg

Linear regression
.organize_glmnet_pred

Organize glmnet predictions
list_md_problems

Locate and show errors/warnings in engine-specific documentation
knit_engine_docs

Knit engine-specific documentation
fit_control

Control the fit function
mlp

Single layer neural network
min_cols

Execution-time data dimension checks
model_spec

Model Specification Information
make_call

Make a parsnip call expression
logistic_reg

Logistic regression
model_fit

Model Fit Object Information
model_printer

Print helper for model objects
multi_predict

Model predictions across many sub-models
gen_additive_mod

Generalized additive models (GAMs)
mars

Multivariate adaptive regression splines (MARS)
model_db

parsnip model specification database
null_model

Null model
naive_Bayes

Naive Bayes models
nearest_neighbor

K-nearest neighbors
maybe_matrix

Fuzzy conversions
parsnip_addin

Start an RStudio Addin that can write model specifications
max_mtry_formula

Determine largest value of mtry from formula. This function potentially caps the value of mtry based on a formula and data set. This is a safe approach for survival and/or multivariate models.
keras_mlp

Simple interface to MLP models via keras
make_classes

Prepend a new class
has_multi_predict

Tools for models that predict on sub-models
format-internals

Internal functions that format predictions
update.bag_mars

Updating a model specification
pls

Partial least squares (PLS)
nullmodel

Fit a simple, non-informative model
predict_class.model_fit

Other predict methods.
multinom_reg

Multinomial regression
rand_forest

Random forest
proportional_hazards

Proportional hazards regression
req_pkgs

Determine required packages for a model
rpart_train

Decision trees via rpart
parsnip-package

parsnip
rule_fit

RuleFit models
reexports

Objects exported from other packages
repair_call

Repair a model call object
show_call

Print the model call
show_engines

Display currently available engines for a model
required_pkgs.model_spec

Determine required packages for a model
stan_conf_int

Wrapper for stan confidence intervals
poisson_reg

Poisson regression models
svm_linear

Linear support vector machines
predict.model_fit

Model predictions
prepare_data

Prepare data based on parsnip encoding information
surv_reg

Parametric survival regression
survival_reg

Parametric survival regression
translate

Resolve a Model Specification for a Computational Engine
tidy.model_fit

Turn a parsnip model object into a tidy tibble
varying_args.model_spec

Determine varying arguments
type_sum.model_spec

Succinct summary of parsnip object
tidy.nullmodel

Tidy method for null models
xgb_train

Boosted trees via xgboost
svm_poly

Polynomial support vector machines
svm_rbf

Radial basis function support vector machines
set_tf_seed

Set seed in R and TensorFlow at the same time
tidy._LiblineaR

tidy methods for LiblineaR models
set_new_model

Tools to Register Models
set_args

Change elements of a model specification
set_engine

Declare a computational engine and specific arguments
tidy._elnet

tidy methods for glmnet models
update_model_info_file

Save information about models
varying

A placeholder function for argument values
augment.model_fit

Augment data with predictions
autoplot.model_fit

Create a ggplot for a model object