Learn R Programming

MediaNews (version 0.2.1)

ClearText: Text Cleaning: Custom Method

Description

Cleans text and introduce custom stopwords to remove unwanted words from given data.

Usage

ClearText(Text, CustomList = c(""))

Arguments

Text

A String or Character vector, user-defined.

CustomList

A Character vector (Optional), user-defined vector to introduce stopwords ("english") in Text.

Value

Returns Character

See Also

TOI_News_Articles, TOI_News_Dataset

Examples

Run this code
# NOT RUN {
################### Methodology #####################
###### For DataFrame ######
#### Creates Dataset based on keysword
# }
# NOT RUN {
NewsData = TOI_News_Articles("Goibibo")

## Identify any potential factor columns
vc = sapply(NewsData, is.factor)

## Convert factors to characters
NewsData[vc] = lapply(NewsData[vc], as.character)

## Clean text on specific character columns
for (i in 1:nrow(NewsData)) NewsData$News[i] = ClearText(NewsData$News[i])
# }
# NOT RUN {
######## For Character Variable #### Ex2 ####

para = "Moreover, the text data we get is noisy. But, if we can learn some
methods useful to extract important features from the noisy data, wouldn't
scandal that be amazing ? In this tuto23rial, you'll saadc@ruby.com
learn #world all ab33out regu12lar expressions from scratch. At first, 32324
detective you might find these confusing, or complicated, but after
https://anaconda.com/anaconda-enters-new-chapter/ expressions tricky,
scooby-doo doing practical hands-on exercises (done below)
you should feel bcc: @MikeQuindazzi quite comfortable with it.
In addition, we'll also cartoon-network learn about string 121manipulation
functions in R. This formidable combination of #DL #4IR #Robots
#ArtificialIntelligence string manipulation functions and regular
expressions will prepare you for text mining."

clearpara = ClearText(para,
                       CustomList = c("scooby-doo",
                                      "cartoon-network",
                                       "detective",
                                       "scandal"))
########### For List #############
# }
# NOT RUN {
paraList = list(para, 1213, factor('aasd;kasdioasd'))
paraList = lapply(paraList, as.character)
for (x in 1:length(paraList)) paraList[[x]] = ClearText(paraList[[x]])
# }

Run the code above in your browser using DataLab