ngram (version 3.0.4)

preprocess: Basic Text Preprocessor

Description

A simple text preprocessor for use with the ngram() function.

Usage

preprocess(x, case = "lower", remove.punct = FALSE,
  remove.numbers = FALSE, fix.spacing = TRUE)

Arguments

x

Input text.

case

Option to change the case of the text. Value should be "upper", "lower", or NULL (no change).

remove.punct

Logical; should punctuation be removed?

remove.numbers

Logical; should numbers be removed?

fix.spacing

Logical; should multi/trailing spaces be collapsed/removed.

Value

concat() returns

Details

The input text x must already be in the correct form for ngram(), i.e., a single string (character vector of length 1).

The case argument can take 3 possible values: NULL, in which case nothing is done, or lower or upper, wherein the case of the input text will be made lower/upper case, repesctively.

Examples

Run this code
# NOT RUN {
library(ngram)

x <- "Watch  out    for snakes!  111"
preprocess(x)
preprocess(x, remove.punct=TRUE, remove.numbers=TRUE)

# }

Run the code above in your browser using DataLab