Learn R Programming

ngram (version 1.1)

Preprocess: Preprocessing

Description

A simple text preprocessor for use with the ngram() function.

Usage

preprocess(x, case=NULL, split.at.punct=FALSE)

Arguments

x
Input text.
case
Option to change the case of the text. See Details section for appropriate values.
split.at.punct
logical; determines if spaces should be inserted before and after punctuation (making them individual characters for an n-gram model).

Value

  • concat() returns

Details

The input text x must already be in the correct form for ngram(), i.e., a single string (character vector of length 1). The case argument can take 3 possible values: NULL, in which case nothing is done, or lower or upper, wherein the case of the input text will be made lower/upper case, repesctively.

See Also

Process, Utilities

Examples

Run this code
library(ngram)

x <- "Watch  out    for snakes!  "
preprocess(x)
preprocess(x, case="upper", split.at.punct=TRUE)

Run the code above in your browser using DataLab