textrecipes (version 0.0.1)

step_untokenize: Untokenization of list-column variables

Description

`step_untokenize` creates a *specification* of a recipe step that will convert a list of tokens into a character predictor.

Usage

step_untokenize(recipe, ..., role = NA, trained = FALSE,
  columns = NULL, sep = " ", skip = FALSE,
  id = rand_id("untokenize"))

# S3 method for step_untokenize tidy(x, ...)

Arguments

recipe

A recipe object. The step will be added to the sequence of operations for this recipe.

...

One or more selector functions to choose variables. For `step_untokenize`, this indicates the variables to be encoded into a list column. See [recipes::selections()] for more details. For the `tidy` method, these are not currently used.

role

Not used by this step since no new variables are created.

trained

A logical to indicate if the recipe has been baked.

columns

A list of tibble results that define the encoding. This is `NULL` until the step is trained by [recipes::prep.recipe()].

sep

a character to determine how the tokens should be seperated when pasted together. Defaults to `" "`.

skip

A logical. Should the step be skipped when the recipe is baked by [recipes::bake.recipe()]? While all operations are baked when [recipes::prep.recipe()] is run, some operations may not be able to be conducted on new data (e.g. processing the outcome variable(s)). Care should be taken when using `skip = TRUE` as it may affect the computations for subsequent operations.

id

A character string that is unique to this step to identify it.

x

A `step_untokenize` object.

Value

An updated version of `recipe` with the new step added to the sequence of existing steps (if any).

Details

This steps will turn a tokenized list-column back into a character vector.

Examples

Run this code
# NOT RUN {
library(recipes)

data(okc_text)

okc_rec <- recipe(~ ., data = okc_text) %>%
  step_tokenize(essay0) %>%
  step_untokenize(essay0) 
  
okc_obj <- okc_rec %>%
  prep(training = okc_text, retain = TRUE)

juice(okc_obj, essay0) %>% 
  slice(1:2)

juice(okc_obj) %>% 
  slice(2) %>% 
  pull(essay0) 
  
tidy(okc_rec, number = 2)
tidy(okc_obj, number = 2)
# }

Run the code above in your browser using DataCamp Workspace