textshape (version 1.6.0)

split_portion: Break Text Into Ordered Word Chunks

Description

Some visualizations and algorithms require text to be broken into chunks of ordered words. split_portion breaks text, optionally by grouping variables, into equal chunks. The chunk size can be specified by giving number of words to be in each chunk or the number of chunks.

Usage

split_portion(text.var, grouping.var = NULL, n.words, n.chunks,
  as.string = TRUE, rm.unequal = FALSE, as.table = TRUE, ...)

Arguments

text.var

The text variable

grouping.var

The grouping variables. Default NULL generates one word list for all text. Also takes a single grouping variable or a list of 1 or more grouping variables.

n.words

An integer specifying the number of words in each chunk (must specify n.chunks or n.words).

n.chunks

An integer specifying the number of chunks (must specify n.chunks or n.words).

as.string

logical. If TRUE the chunks are returned as a single string. If FALSE the chunks are returned as a vector of single words.

rm.unequal

logical. If TRUE final chunks that are unequal in length to the other chunks are removed.

as.table

logical. If TRUE the list output is coerced to data.table or tibble.

Ignored.

Value

Returns a list or data.table of text chunks.

Examples

Run this code
# NOT RUN {
with(DATA, split_portion(state, n.chunks = 10))
with(DATA, split_portion(state, n.words = 10))
with(DATA, split_portion(state, n.chunks = 10, as.string=FALSE))
with(DATA, split_portion(state, n.chunks = 10, rm.unequal=TRUE))
with(DATA, split_portion(state, person, n.chunks = 10))
with(DATA, split_portion(state, list(sex, adult), n.words = 10))
with(DATA, split_portion(state, person, n.words = 10, rm.unequal=TRUE))

## Bigger data
with(hamlet, split_portion(dialogue, person, n.chunks = 10))
with(hamlet, split_portion(dialogue, list(act, scene, person), n.chunks = 10))
with(hamlet, split_portion(dialogue, person, n.words = 300))
with(hamlet, split_portion(dialogue, list(act, scene, person), n.words = 300))
# }

Run the code above in your browser using DataLab