This function transforms texts into words, calculate frequencies, supress stop words in a given language.
textTokenizer(text, lang = "english", exclude = c(),
keep_spaces = FALSE, df = FALSE, min = 2)
Character vector
Character. Language in text (used for stop words)
Character vector. Which word do you wish to exclude?
Boolean. If you wish to keep spaces in each line to keep unique compount words, separated with spaces, set to TRUE. For example, 'LA ALAMEDA' will be set as 'LA_ALAMEDA' and treated as a single word.
Boolean. Return a dataframe with a one-hot-encoding kind of results? Each word is a column and returns if word is contained.
Integer. If df = TRUE, what is the minimum frequency for the word to be considered.
Other Data Wrangling: balance_data
,
calibrate
, categ_reducer
,
cleanText
, date_feats
,
dateformat
, formatNum
,
formatTime
, holidays
,
impute
, left
,
normalize
, numericalonly
,
ohse
,
one_hot_encoding_commas
,
rbind_full
, removenacols
,
removenarows
, replaceall
,
right
, textFeats
,
vector2text
, year_month
,
year_week
Other Text Mining: cleanText
,
replaceall
,
sentimentBreakdown
,
textCloud
, textFeats