Learn R Programming

lares (version 4.8.4)

cleanText: Clean text

Description

This function lets the user clean text into getting only alphanumeric characters and no accents/symbols on letters.

Usage

cleanText(text, spaces = TRUE, lower = TRUE)

Arguments

text

Character Vector

spaces

Boolean. Keep spaces?

lower

Boolean. Transform all to lower case?

See Also

Other Data Wrangling: balance_data(), categ_reducer(), date_cuts(), date_feats(), dateformat(), formatNum(), formatTime(), holidays(), impute(), left(), normalize(), numericalonly(), ohe_commas(), ohse(), rbind_full(), removenacols(), removenarows(), replaceall(), right(), textFeats(), textTokenizer(), vector2text(), year_month(), year_week()

Other Text Mining: replaceall(), sentimentBreakdown(), textCloud(), textFeats(), textTokenizer(), topics_rake()

Examples

Run this code
# NOT RUN {
cleanText("Bernardo Lares 123")
cleanText("B<U+00E8>rn<U+00E4>rdo L<U+00E1>reS 123", lower = FALSE)
cleanText("Bernardo Lares", spaces = FALSE)
cleanText("\\@<U+00AE><U+00EC><U+00F7><U+00E5>   %<U+00F1>S  ..-X", spaces = FALSE)
cleanText(c("Mar<U+00ED>a", "<U+20AC>", "N<U+00FA><U+00F1>ez"))
# }

Run the code above in your browser using DataLab