qdap (version 2.4.6)

bag_o_words: Bag of Words

Description

bag_o_words - Reduces a text column to a bag of words.

unbag - Wrapper for paste(collapse=" ") to glue words back into strings.

breaker - Reduces a text column to a bag of words and qdap recognized end marks.

word_split - Reduces a text column to a list of vectors of bag of words and qdap recognized end marks (i.e., ".", "!", "?", "*", "-").

Usage

bag_o_words(text.var, apostrophe.remove = FALSE, ...)

unbag(text.var, na.rm = TRUE)

breaker(text.var)

word_split(text.var)

Value

Returns a vector of stripped words.

unbag - Returns a string.

breaker - Returns a vector of striped words and qdap recognized endmarks (i.e., ".", "!", "?", "*", "-").

Arguments

text.var

The text variable.

apostrophe.remove

logical. If TRUE removes apostrophe's from the output.

na.rm

logical. If TRUE NAs are removed before pasting.

...

Additional arguments passed to strip.

Examples

Run this code
if (FALSE) {
bag_o_words("I'm going home!")
bag_o_words("I'm going home!", apostrophe.remove = TRUE)
unbag(bag_o_words("I'm going home!"))

bag_o_words(DATA$state)
by(DATA$state, DATA$person, bag_o_words)
lapply(DATA$state,  bag_o_words)

breaker(DATA$state)
by(DATA$state, DATA$person, breaker)
lapply(DATA$state,  breaker)
unbag(breaker(DATA$state))

word_split(c(NA, DATA$state))
unbag(word_split(c(NA, DATA$state)))
}

Run the code above in your browser using DataLab