Learn R Programming

quanteda (version 0.7.2-1)

ngrams: Create ngrams

Description

Create a set of ngrams (words in sequence) from text(s) in a character vector

Usage

ngrams(text, n = 2, concatenator = "_", include.all = FALSE, ...)

Arguments

text
character vector containing the texts from which ngrams will be extracted
n
the number of tokens to concatenate. Default is 2 for bigrams.
concatenator
character for combining words, default is _ (underscore) character
include.all
if TRUE, add n-1...1 grams to the returned list
...
additional parameters

Value

  • a list of character vectors of ngrams, one list element per text

Details

... provides additional arguments passed to tokenize

Examples

Run this code
ngrams("The quick brown fox jumped over the lazy dog.", n=2)
identical(ngrams("The quick brown fox jumped over the lazy dog.", n=2),
          bigrams("The quick brown fox jumped over the lazy dog.", n=2))
ngrams("The quick brown fox jumped over the lazy dog.", n=3)
ngrams("The quick brown fox jumped over the lazy dog.", n=3, concatenator="~")
ngrams("The quick brown fox jumped over the lazy dog.", n=3, include.all=TRUE)

Run the code above in your browser using DataLab