Learn R Programming

ngram (version 1.1)

ngram-class: Class ngram

Description

An n-gram is an ordered sequence of n "words" taken from a body of "text". The terms "words" and "text" can easily be interpreted literally, or with a more loose interpretation. For example, consider the sequence "A B A C A B B". If we examine the 2-grams (or bigrams) of this sequence, they are A B, B A, A C, C A, A B, B B or without repetition: A B, B A, A C, C A, B B That is, we take the input string and group the "words" 2 at a time (because n=2). Notice that the number of n-grams and the number of words are not obviously related; counting repetition, the number of n-grams is equal to nwords - n + 1 Bounds ignoring repetition are highly dependent on the input. A correct but useless bound is # ngrams = nwords - (# repeats - 1) - (n - 1) An ngram object is an S4 class container that stores some basic summary information (e.g., n), and several external pointers. For information on how to construct an ngram object, see ngram.

Arguments

Creating Objects

new('ngram', str_ptr = ..., strlen = ..., n = ..., ng_ptr = ..., ngsize = ..., wl_ptr = ...)

Details

The ngram class is a container for the output of the processing routine ngram(), most of which are external pointers. As such, does not store much data (a few KiB), regardless of the input data size. Additionally, this makes saving such objects via save() and then loading them later with load() useless at best, and dangerous at worst.

See Also

Process