textrecipes (version 0.3.0)

step_untokenize: Untokenization of tokenlist variables

Description

step_untokenize creates a specification of a recipe step that will convert a tokenlist into a character predictor.

Usage

step_untokenize(
  recipe,
  ...,
  role = NA,
  trained = FALSE,
  columns = NULL,
  sep = " ",
  skip = FALSE,
  id = rand_id("untokenize")
)

# S3 method for step_untokenize tidy(x, ...)

Arguments

recipe

A recipe object. The step will be added to the sequence of operations for this recipe.

...

One or more selector functions to choose variables. For step_untokenize, this indicates the variables to be encoded into a tokenlist. See recipes::selections() for more details. For the tidy method, these are not currently used.

role

Not used by this step since no new variables are created.

trained

A logical to indicate if the recipe has been baked.

columns

A list of tibble results that define the encoding. This is NULL until the step is trained by recipes::prep.recipe().

sep

a character to determine how the tokens should be separated when pasted together. Defaults to " ".

skip

A logical. Should the step be skipped when the recipe is baked by recipes::bake.recipe()? While all operations are baked when recipes::prep.recipe() is run, some operations may not be able to be conducted on new data (e.g. processing the outcome variable(s)). Care should be taken when using skip = TRUE as it may affect the computations for subsequent operations.

id

A character string that is unique to this step to identify it.

x

A step_untokenize object.

Value

An updated version of recipe with the new step added to the sequence of existing steps (if any).

Details

This steps will turn a tokenlist back into a character vector. This step is calling paste internally to put the tokens back together to a character.

See Also

step_tokenize() to turn character into tokenlist.

Examples

Run this code
# NOT RUN {
library(recipes)
library(modeldata)
data(okc_text)

okc_rec <- recipe(~ ., data = okc_text) %>%
  step_tokenize(essay0) %>%
  step_untokenize(essay0)

okc_obj <- okc_rec %>%
  prep()

juice(okc_obj, essay0) %>%
  slice(1:2)

juice(okc_obj) %>%
  slice(2) %>%
  pull(essay0)

tidy(okc_rec, number = 2)
tidy(okc_obj, number = 2)
# }

Run the code above in your browser using DataLab