This function lets the user clean text into getting only alphanumeric characters and no accents/symbols on letters.
cleanText(text, spaces = TRUE, lower = TRUE, ascii = TRUE, title = FALSE)
Character Vector
Boolean. Keep spaces? If character input, spaces will be transformed into passed argument.
Boolean. Transform all to lower case?
Boolean. Only ASCII characters?
Boolean. Transform to title format (upper case on first letters)
Character vector with transformed strings.
Other Data Wrangling:
balance_data()
,
categ_reducer()
,
date_cuts()
,
date_feats()
,
formatNum()
,
holidays()
,
impute()
,
left()
,
normalize()
,
numericalonly()
,
ohe_commas()
,
ohse()
,
removenacols()
,
removenarows()
,
replaceall()
,
textFeats()
,
textTokenizer()
,
vector2text()
,
year_month()
Other Text Mining:
cleanNames()
,
ngrams()
,
remove_stopwords()
,
replaceall()
,
sentimentBreakdown()
,
textCloud()
,
textFeats()
,
textTokenizer()
,
topics_rake()
# NOT RUN {
cleanText("Bernardo Lares 123")
cleanText("B<U+00E8>rn<U+00E4>rdo L<U+00E1>reS 123", lower = FALSE)
cleanText("Bernardo Lare$", spaces = ".", ascii = FALSE)
cleanText("\\@<U+00AE><U+00EC><U+00F7><U+00E5> %<U+00F1>S ..-X", spaces = FALSE)
cleanText(c("mar<U+00ED>a", "<U+20AC>", "n<U+00FA><U+00F1>ez_a."), title = TRUE)
# }
Run the code above in your browser using DataLab