Learn R Programming

hardhat (version 1.4.2)

shrink: Subset only required columns

Description

shrink() subsets data to only contain the required columns specified by the prototype, ptype.

Usage

shrink(data, ptype, ..., call = current_env())

Value

A tibble containing the required columns.

Arguments

data

A data frame containing the data to subset.

ptype

A data frame prototype containing the required columns.

...

These dots are for future extensions and must be empty.

call

The call used for errors and warnings.

Details

shrink() is called by forge() before scream() and before the actual processing is done.

Examples

Run this code
# ---------------------------------------------------------------------------
# Setup

train <- iris[1:100, ]
test <- iris[101:150, ]

# ---------------------------------------------------------------------------
# shrink()

# mold() is run at model fit time
# and a formula preprocessing blueprint is recorded
x <- mold(log(Sepal.Width) ~ Species, train)

# Inside the result of mold() are the prototype tibbles
# for the predictors and the outcomes
ptype_pred <- x$blueprint$ptypes$predictors
ptype_out <- x$blueprint$ptypes$outcomes

# Pass the test data, along with a prototype, to
# shrink() to extract the prototype columns
shrink(test, ptype_pred)

# To extract the outcomes, just use the
# outcome prototype
shrink(test, ptype_out)

# shrink() makes sure that the columns
# required by `ptype` actually exist in the data
# and errors nicely when they don't
test2 <- subset(test, select = -Species)
try(shrink(test2, ptype_pred))

Run the code above in your browser using DataLab