removePunctuation

0th

Percentile

Remove Punctuation Marks from a Text Document

Remove punctuation marks from a text document.

Usage
# S3 method for character
removePunctuation(x,
                  preserve_intra_word_contractions = FALSE,
                  preserve_intra_word_dashes = FALSE,
                  ucp = FALSE, …)
# S3 method for PlainTextDocument
removePunctuation(x, …)
Arguments
x

a character vector or text document.

preserve_intra_word_contractions

a logical specifying whether intra-word contractions should be kept.

preserve_intra_word_dashes

a logical specifying whether intra-word dashes should be kept.

ucp

a logical specifying whether to use Unicode character properties for determining punctuation characters. If FALSE (default), characters in the ASCII [:punct:] class are taken; if TRUE, the characters with Unicode general category P (Punctuation).

arguments to be passed to or from methods; in particular, from the PlainTextDocument method to the character method.

Value

The character or text document x without punctuation marks (besides intra-word contractions (') and intra-word dashes (-) if preserve_intra_word_contractions and preserve_intra_word_dashes are set, respectively).

See Also

getTransformations to list available transformation (mapping) functions.

regex shows the class [:punct:] of punctuation characters.

http://unicode.org/reports/tr44/#General_Category_Values.

Aliases
  • removePunctuation
  • removePunctuation.character
  • removePunctuation.PlainTextDocument
Examples
# NOT RUN {
data("crude")
inspect(crude[[14]])
inspect(removePunctuation(crude[[14]]))
inspect(removePunctuation(crude[[14]],
                          preserve_intra_word_contractions = TRUE,
                          preserve_intra_word_dashes = TRUE))
# }
Documentation reproduced from package tm, version 0.7-6, License: GPL-3

Community examples

chanchalsheik@gmail.com at Nov 12, 2017 tm v0.7-1

#install.packages("tm") #library(tm) #REMOVE PUNCTUATION # mydatac is the name of the data column tm_map(mydatac, removePunctuation)