Learn R Programming

⚠️There's a newer version (1.1.0) of this package.Take me there.

parsnip

One issue with different functions available in R that do the same thing is that they can have different interfaces and arguments. For example, to fit a random forest classification model, we might have:

# From randomForest
rf_1 <- randomForest(x, y, mtry = 12, ntree = 2000, importance = TRUE)

# From ranger
rf_2 <- ranger(
  y ~ ., 
  data = dat, 
  mtry = 12, 
  num.trees = 2000, 
  importance = 'impurity'
)

# From sparklyr
rf_3 <- ml_random_forest(
  dat, 
  intercept = FALSE, 
  response = "y", 
  features = names(dat)[names(dat) != "y"], 
  col.sample.rate = 12,
  num.trees = 2000
)

Note that the model syntax is very different and that the argument names (and formats) are also different. This is a pain if you go between implementations.

In this example, the type of model is "random forest" while the mode of the model is "classification" (as opposed to regression, survival analysis, etc).

The idea of parsnip is to:

  • Separate the definition of a model from its evaluation.
  • Decouple the model specification from the implementation (whether the implementation is in R, spark, or something else). For example, the user would call rand_forest instead of ranger::ranger or other specific packages.
  • Harmonize the argument names (e.g. n.trees, ntrees, trees) so that users can remember a single name. This will help across model types too so that trees will be the same argument across random forest as well as boosting or bagging.

To install it, use:

require(devtools)
install_github("topepo/parsnip")

Copy Link

Version

Install

install.packages('parsnip')

Monthly Downloads

44,268

Version

0.0.0.9001

License

GPL-2

Maintainer

Max Kuhn

Last Published

August 18th, 2023

Functions in parsnip (0.0.0.9001)

logistic_reg

General Interface for Logistic Regression Models
varying

A Placeholder Function for Argument Values
translate

Resolve a Model Specification for a Computational Engine
linear_reg

General Interface for Linear Regression Models
lending_club

Loan Data
surv_reg

General Interface for Parametric Survival Models
rand_forest

General Interface for Random Forest Models
fit

Fit a Model Specification to a Dataset
fit_control

Control the fit function