powered by
This function extracts n-grams from text.
get_ngrams( x, n = 2, min_freq = 1, ngram_quantile = NULL, stop_words, rm_punctuation = FALSE, preserve_chars = c("-", "_"), language = "English" )
A character vector from which to extract n-grams.
Numeric: the minimum number of terms in an n-gram.
Numeric: the minimum number of times an n-gram must occur to be returned.
Numeric: what quantile of ngrams should be retained. Defaults to 0.8; i.e. the 80th percentile of ngram frequencies.
A character vector of stopwords to ignore.
Logical: should punctuation be removed before selecting ngrams?
A character vector of punctuation marks to be retained if rm_punctuation is TRUE.
A string indicating the language to use for removing stopwords.
A character vector of n-grams.
# NOT RUN { get_ngrams("On the Origin of Species By Means of Natural Selection") # }
Run the code above in your browser using DataLab