Unlimited learning, half price | 50% off
Get 50% off unlimited learning

⚠️There's a newer version (1.2.1) of this package.Take me there.

recipes (version 1.2.0)

Preprocessing and Feature Engineering Steps for Modeling

Description

A recipe prepares your data for modeling. We provide an extensible framework for pipeable sequences of feature engineering steps provides preprocessing tools to be applied to data. Statistical parameters for the steps can be estimated from an initial data set and then applied to other data sets. The resulting processed output can then be used as inputs for statistical or machine learning models.

Copy Link

Version

Install

install.packages('recipes')

Monthly Downloads

218,325

Version

1.2.0

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Max Kuhn

Last Published

March 17th, 2025

Functions in recipes (1.2.0)

detect_step

Detect if a particular step or check is used in a recipe
check_type

Quantitatively check on variables
check_range

Check range consistency
discretize

Discretize Numeric Variables
.recipes_preserve_sparsity

Does step destroy sparsity of columns
fixed

Helper Functions for Profile Data Sets
format_ch_vec

Helpers for printing step functions
.recipes_estimate_sparsity

Estimate sparity of a recipe
.recipes_toggle_sparse_args

Toggle all auto sparse arguments
fully_trained

Check to see if a recipe is trained/prepared
.get_data_types

Get types for use in recipes
formula.recipe

Create a formula from a prepared recipe
get_keep_original_cols

Get the keep_original_cols value of a recipe step
names0

Naming Tools
prep

Estimate a preprocessing recipe
developer_functions

Developer functions for creating recipes steps
has_role

Role Selection
juice

Extract transformed training set
recipes_eval_select

Evaluate a selection with tidyselect semantics specific to recipes
prepper

Wrapper function for preparing recipes within resampling
print.recipe

Print a Recipe
recipes

recipes: A package for computing and preprocessing design matrices.
recipes_ptype

Prototype of recipe object
recipes_ptype_validate

Validate prototype of recipe object
recipe

Create a recipe for preprocessing data
rand_id

Make a random identification field for steps
recipes_pkg_check

Update packages
recipes_extension_check

Checks that steps have all S3 methods
step_BoxCox

Box-Cox transformation for non-negative data
step_YeoJohnson

Yeo-Johnson transformation
selections

Methods for selecting variables in step functions
roles

Manually alter roles
yj_transform

Internal Functions
recipes-role-indicator

Role indicators
recipes_remove_cols

Removes columns if options apply
step

step sets the class of the step and check is for checks.
sparse_data

Using sparse data with recipes
reexports

Objects exported from other packages
required_pkgs.step_classdist_shrunken

S3 methods for tracking which additional packages are needed for steps.
step_center

Centering numeric data
step_bin2factor

Create a factors from A dummy variable
step_classdist

Distances to class centroids
step_arrange

Sort rows using dplyr
step_count

Create counts of patterns using regular expressions
step_cut

Cut a numeric variable into a factor
remove_original_cols

Removes original columns if options apply
step_bagimpute

Impute via bagged trees
step_classdist_shrunken

Compute shrunken centroid distances for classification models
step_corr

High correlation filter
step_bs

B-spline basis functions
step_factor2string

Convert factors to strings
step_geodist

Distance between two locations
step_discretize

Discretize Numeric Variables
step_filter_missing

Missing value column filter
step_dummy

Create traditional dummy variables
step_dummy_extract

Extract patterns from nominal data
step_dummy_multi_choice

Handle levels in multiple predictors together
step_date

Date feature generator
step_filter

Filter rows using dplyr
step_depth

Data depths
step_impute_mean

Impute numeric data using the mean
step_ica

ICA signal extraction
step_impute_bag

Impute via bagged trees
step_impute_knn

Impute via k-nearest neighbors
step_harmonic

Add sin and cos terms for harmonic analysis
step_impute_linear

Impute numeric variables via a linear model
step_impute_lower

Impute numeric data below the threshold of measurement
step_hyperbolic

Hyperbolic transformations
step_impute_median

Impute numeric data using the median
step_holiday

Holiday feature generator
step_integer

Convert values to predefined integers
step_impute_mode

Impute nominal data using the most common value
step_intercept

Add intercept (or constant) column
step_inverse

Inverse transformation
step_impute_roll

Impute numeric data using a rolling window statistic
step_invlogit

Inverse logit transformation
step_knnimpute

Impute via k-nearest neighbors
step_isomap

Isomap embedding
step_indicate_na

Create missing data column indicators
step_interact

Create interaction variables
step_kpca

Kernel PCA signal extraction
step_lincomb

Linear combination filter
step_medianimpute

Impute numeric data using the median
step_kpca_rbf

Radial basis function kernel PCA signal extraction
step_logit

Logit transformation
step_log

Logarithmic transformation
step_lag

Create a lagged predictor
step_kpca_poly

Polynomial kernel PCA signal extraction
step_meanimpute

Impute numeric data using the mean
step_lowerimpute

Impute numeric data below the threshold of measurement
step_ns

Natural spline basis functions
step_normalize

Center and scale numeric data
step_num2factor

Convert numbers to factors
step_mutate

Add new variables using dplyr
step_nnmf

Non-negative matrix factorization signal extraction
step_nnmf_sparse

Non-negative matrix factorization signal extraction with lasso penalization
step_modeimpute

Impute nominal data using the most common value
step_mutate_at

Mutate multiple columns using dplyr
step_novel

Simple value assignments for novel factor levels
step_naomit

Remove observations with missing values
step_poly

Orthogonal polynomial basis functions
step_pca

PCA signal extraction
step_ordinalscore

Convert ordinal factors to numeric scores
step_pls

Partial least squares feature extraction
step_percentile

Percentile transformation
step_profile

Create a profiling version of a data set
step_nzv

Near-zero variance filter
step_poly_bernstein

Generalized bernstein polynomial basis
step_range

Scaling numeric data to a specific range
step_other

Collapse infrequent categorical levels
step_ratio

Ratio variable creation
step_regex

Detect a regular expression
step_rename

Rename variables by name using dplyr
step_rm

General variable filter
step_scale

Scaling numeric data
step_rollimpute

Impute numeric data using a rolling window statistic
step_relu

Apply (smoothed) rectified linear transformation
step_sample

Sample rows using dplyr
step_rename_at

Rename multiple columns using dplyr
step_relevel

Relevel factors to a desired level
step_sqrt

Square root transformation
step_spline_nonnegative

Non-negative splines
step_select

Select variables using dplyr
step_spline_b

Basis splines
step_slice

Filter rows by position using dplyr
step_spatialsign

Spatial sign preprocessing
step_spline_convex

Convex splines
step_spline_natural

Natural splines
step_spline_monotone

Monotone splines
step_shuffle

Shuffle variables
terms_select

Select terms in a step function.
tidy.step_BoxCox

Tidy the result of a recipe
step_string2factor

Convert strings to factors
summary.recipe

Summarize a recipe
step_zv

Zero variance filter
step_unorder

Convert ordered factors to unordered factors
step_window

Moving window functions
step_time

Time feature generator
update.step

Update a recipe step
step_unknown

Assign missing categories to "unknown"
update_role_requirements

Update role specific requirements
add_step

Add a New Operation to the Current Recipe
check_cols

Check if all columns are present
check_new_data

Check for required column at bake-time
check_new_values

Check for new values
check_missing

Check for missing values
bake

Apply a trained preprocessing recipe
case-weight-helpers

Helpers for steps with case weights
check_class

Check variable class
case_weights

Using case weights with recipes
check_name

check that newly created variable names don't overlap