count_multigrams: Detect and count multiple n-grams in sequences
Description
A convinient wrapper around count_ngrams for counting multiple
values of n and d.
Usage
count_multigrams(n_d, seq, u, d = 0, pos = FALSE, scale = FALSE,
threshold = 0)
Arguments
n_d
list list of n-grams' sizes and distances between elements of n-gram.
See Details.
seq
integer vector or matrix describing sequence(s).
u
unigrams (integer, numeric or character vector).
d
integer vector of distances between elements of n-gram (0 means
consecutive elements). See Details.
pos
logical, if TRUE n_grams contains position information.
scale
logical, if TRUE output data is normalized. Should be
used only for n-grams without position information. See Details.
threshold
integer, if not equal to 0, data is binarized into
two groups (larger or equal to threshold, smaller than threshold).
Value
a integer matrix with named columns. The naming conventions are the same
as in count_ngrams.
Details
Each element of n_d is a list consisting of two vectors.
First element is a single integer value which determines the number of
words in n-gram (equivalent of n from count_ngrams). Second
element must be an integer vector describing distances between words in
n-gram (equivalent of d from count_ngrams).