removePunctuation
Remove Punctuation Marks from a Text Document
Remove punctuation marks from a text document.
Usage
# S3 method for character
removePunctuation(x,
preserve_intra_word_contractions = FALSE,
preserve_intra_word_dashes = FALSE,
ucp = FALSE, …)
# S3 method for PlainTextDocument
removePunctuation(x, …)
Arguments
- x
a character vector or text document.
- preserve_intra_word_contractions
a logical specifying whether intra-word contractions should be kept.
- preserve_intra_word_dashes
a logical specifying whether intra-word dashes should be kept.
- ucp
a logical specifying whether to use Unicode character properties for determining punctuation characters. If
FALSE
(default), characters in the ASCII[:punct:]
class are taken; ifTRUE
, the characters with Unicode general categoryP
(Punctuation).- …
arguments to be passed to or from methods; in particular, from the
PlainTextDocument
method to thecharacter
method.
Value
The character or text document x
without punctuation marks
(besides intra-word contractions (') and intra-word dashes
(-) if preserve_intra_word_contractions
and
preserve_intra_word_dashes
are set, respectively).
See Also
getTransformations
to list available transformation
(mapping) functions.
regex
shows the class [:punct:]
of punctuation
characters.
Examples
# NOT RUN {
data("crude")
inspect(crude[[14]])
inspect(removePunctuation(crude[[14]]))
inspect(removePunctuation(crude[[14]],
preserve_intra_word_contractions = TRUE,
preserve_intra_word_dashes = TRUE))
# }
Community examples
#install.packages("tm") #library(tm) #REMOVE PUNCTUATION # mydatac is the name of the data column tm_map(mydatac, removePunctuation)