Learn R Programming

quanteda (version 0.8.2-0)

ngrams: Create ngrams

Description

Create a set of ngrams (words in sequence) from text(s) in a character vector

Usage

ngrams(text, n = 2, concatenator = "_", include.all = FALSE, ...)

Arguments

text
character vector containing the texts from which ngrams will be extracted
n
the number of tokens to concatenate. Default is 2 for bigrams.
concatenator
character for combining words, default is _ (underscore) character
include.all
if TRUE, add n-1...1 grams to the returned list
...
additional parameters passed to tokenize

Value

  • a list of character vectors of ngrams, one list element per text

Examples

Run this code
ngrams("The quick brown fox jumped over the lazy dog.", n=2)
identical(ngrams("The quick brown fox jumped over the lazy dog.", n=2),
          bigrams("The quick brown fox jumped over the lazy dog."))
ngrams("The quick brown fox jumped over the lazy dog.", n=3)
ngrams("The quick brown fox jumped over the lazy dog.", n=3, concatenator="~")
ngrams("The quick brown fox jumped over the lazy dog.", n=3, include.all=TRUE)

Run the code above in your browser using DataLab