Learn R Programming

gibasa (version 1.1.2)

collapse_tokens: Collapse sequences of tokens by condition

Description

Concatenates sequences of tokens in the tidy text dataset, while grouping them by an expression.

Usage

collapse_tokens(tbl, condition, .collapse = "")

Value

A data.frame.

Arguments

tbl

A tidy text dataset.

condition

<data-masked> A logical expression.

.collapse

String with which tokens are concatenated.

Details

Note that this function drops all columns except but 'token' and columns for grouping sequences. So, the returned data.frame has only 'doc_id', 'sentence_id', 'token_id', and 'token' columns.

Examples

Run this code
if (FALSE) {
df <- tokenize(
  data.frame(
    doc_id = "odakyu-sen",
    text = "\u5c0f\u7530\u6025\u7dda"
  )
) |>
  prettify(col_select = "POS1")

collapse_tokens(
  df,
  POS1 == "\u540d\u8a5e" & stringr::str_detect(token, "^[\\p{Han}]+$")
) |>
  head()
}

Run the code above in your browser using DataLab