Last chance! 50% off unlimited learning
Sale ends in
For a recipe with at least one preprocessing operation that has been trained
by prep()
, apply the computations to new data.
bake(object, ...)# S3 method for recipe
bake(object, new_data, ..., composition = "tibble")
A tibble, matrix, or sparse matrix that may have different columns than the
original columns in new_data
.
A trained object such as a recipe()
with at least one
preprocessing operation.
One or more selector functions to choose which variables will be
returned by the function. See selections()
for more details. If no
selectors are given, the default is to use dplyr::everything()
.
A data frame, tibble, or sparse matrix from the Matrix
package for whom the preprocessing will be applied. If NULL
is given to
new_data
, the pre-processed training data will be returned (assuming
that prep(retain = TRUE)
was used). See sparse_data for more
information about use of sparse data.
Either "tibble"
, "matrix"
, "data.frame"
, or
`"dgCMatrix"``for the format of the processed data set. Note that all
computations during the baking process are done in a non-sparse format.
Also, note that this argument should be called after any selectors and
the selectors should only resolve to numeric columns (otherwise an error is
thrown).
bake()
takes a trained recipe and applies its operations to a data set to
create a design matrix. If you are using a recipe as a preprocessor for
modeling, we highly recommend that you use a workflow()
instead of
manually applying a recipe (see the example in recipe()
).
If the data set is not too large, time can be saved by using the retain = TRUE
option of prep()
. This stores the processed version of the training
set. With this option set, bake(object, new_data = NULL)
will return it for
free.
Also, any steps with skip = TRUE
will not be applied to the data when
bake()
is invoked with a data set in new_data
. bake(object, new_data = NULL)
will always have all of the steps applied.
recipe()
and prep()
data(ames, package = "modeldata")
ames <- mutate(ames, Sale_Price = log10(Sale_Price))
ames_rec <-
recipe(Sale_Price ~ ., data = ames[-(1:6), ]) %>%
step_other(Neighborhood, threshold = 0.05) %>%
step_dummy(all_nominal()) %>%
step_interact(~ starts_with("Central_Air"):Year_Built) %>%
step_ns(Longitude, Latitude, deg_free = 2) %>%
step_zv(all_predictors()) %>%
prep()
# return the training set (already embedded in ames_rec)
bake(ames_rec, new_data = NULL)
# apply processing to other data:
bake(ames_rec, new_data = head(ames))
# only return selected variables:
bake(ames_rec, new_data = head(ames), all_numeric_predictors())
bake(ames_rec, new_data = head(ames), starts_with(c("Longitude", "Latitude")))
Run the code above in your browser using DataLab