n=2
). Notice that the number of n-grams
and the number of words are not obviously related; counting
repetition, the number of n-grams is equal to
nwords - n + 1
Bounds ignoring repetition are highly
dependent on the input. A correct but useless bound is
# ngrams = nwords - (# repeats - 1) - (n - 1)
An ngram
object is an S4 class container that stores
some basic summary information (e.g., n), and several external
pointers. For information on how to construct an ngram
object, see ngram
.new('ngram', str_ptr = ..., strlen = ..., n = ...,
ng_ptr = ..., ngsize = ..., wl_ptr = ...)
ngram
class is a container for the output of the processing
routine ngram()
, most of which are external pointers.
As such, does not store much data (a few KiB), regardless of the
input data size. Additionally, this makes saving such objects
via save()
and then loading them later with load()
useless at best, and dangerous at worst.Process