Learn R Programming

lares (version 4.7)

textTokenizer: Tokenize Vectors into Words

Description

This function transforms texts into words, calculate frequencies, supress stop words in a given language.

Usage

textTokenizer(text, lang = "english", exclude = c(),
  keep_spaces = FALSE, df = FALSE, min = 2)

Arguments

text

Character vector

lang

Character. Language in text (used for stop words)

exclude

Character vector. Which word do you wish to exclude?

keep_spaces

Boolean. If you wish to keep spaces in each line to keep unique compount words, separated with spaces, set to TRUE. For example, 'LA ALAMEDA' will be set as 'LA_ALAMEDA' and treated as a single word.

df

Boolean. Return a dataframe with a one-hot-encoding kind of results? Each word is a column and returns if word is contained.

min

Integer. If df = TRUE, what is the minimum frequency for the word to be considered.

See Also

Other Data Wrangling: balance_data, calibrate, categ_reducer, cleanText, date_feats, dateformat, formatNum, formatTime, holidays, impute, left, normalize, numericalonly, ohse, one_hot_encoding_commas, rbind_full, removenacols, removenarows, replaceall, right, textFeats, vector2text, year_month, year_week

Other Text Mining: cleanText, replaceall, sentimentBreakdown, textCloud, textFeats