A simple text preprocessor for use with the ngram() function.
Usage
preprocess(x, case=NULL, split.at.punct=FALSE)
Arguments
x
Input text.
case
Option to change the case of the text. See Details
section for appropriate values.
split.at.punct
logical; determines if spaces should be
inserted before and after punctuation (making them individual
characters for an n-gram model).
Value
concat() returns
Details
The input text x must already be in the correct form for
ngram(), i.e., a single string (character vector of length 1).
The case argument can take 3 possible values: NULL,
in which case nothing is done, or lower or upper,
wherein the case of the input text will be made lower/upper case,
repesctively.