tm (version 0.7-3)

removePunctuation: Remove Punctuation Marks from a Text Document

Description

Remove punctuation marks from a text document.

Usage

# S3 method for character
removePunctuation(x,
                  preserve_intra_word_contractions = FALSE,
                  preserve_intra_word_dashes = FALSE,
                  ucp = FALSE, …)
# S3 method for PlainTextDocument
removePunctuation(x, …)

Arguments

x

a character vector or text document.

preserve_intra_word_contractions

a logical specifying whether intra-word contractions should be kept.

preserve_intra_word_dashes

a logical specifying whether intra-word dashes should be kept.

ucp

a logical specifying whether to use Unicode character properties for determining punctuation characters. If FALSE (default), characters in the ASCII [:punct:] class are taken; if TRUE, the characters with Unicode general category P (Punctuation).

arguments to be passed to or from methods; in particular, from the PlainTextDocument method to the character method.

Value

The character or text document x without punctuation marks (besides intra-word contractions (') and intra-word dashes (-) if preserve_intra_word_contractions and preserve_intra_word_dashes are set, respectively).

See Also

getTransformations to list available transformation (mapping) functions.

regex shows the class [:punct:] of punctuation characters.

http://unicode.org/reports/tr44/#General_Category_Values.

Examples

Run this code
# NOT RUN {
data("crude")
inspect(crude[[14]])
inspect(removePunctuation(crude[[14]]))
inspect(removePunctuation(crude[[14]],
                          preserve_intra_word_contractions = TRUE,
                          preserve_intra_word_dashes = TRUE))
# }

Run the code above in your browser using DataCamp Workspace