txt.to.features(tokenized.text, features = "w", ngram.size = 1)
w
for words, c
for characters (default: w
).make.ngrams
to combine single units into pairs,
triplets or longer n-grams. See help(make.ngrams)
for details.txt.to.words
, txt.to.words.ext
,
make.ngrams
# consider the string my.text:
my.text = "Quousque tandem abutere, Catilina, patientia nostra?"
# split it into a vector of consecutive words:
my.vector.of.words = txt.to.words(my.text)
# build a vector of word 2-grams:
txt.to.features(my.vector.of.words, ngram.size = 2)
# or produce character n-grams (in this case, character tetragrams):
txt.to.features(my.vector.of.words, features = "c", ngram.size = 4)
Run the code above in your browser using DataLab