Learn R Programming

ANLP (version 1.3)

cleanTextData: Clean and tokenize string data

Description

This function applies different cleaning techniques to clean corpus data.

Usage

cleanTextData(data)

Arguments

data
Data read by readTextFile

Value

a list having sampled text data

Details

This function removes non english characters, numbers, white spaces, brackets, punctuation. It also handles cases like abbreviation, contraction. It converts entire text to lower case.

See Also

tm_map iconv content_transformer removeNumbers replace_contraction replace_abbreviation bracketX removePunctuation tolower stripWhitespace