count_multigrams: Detect and count multiple n-grams in sequences

Description

A convinient wrapper around count_ngrams for counting multiple values of n and d.

Usage

count_multigrams(n_d, seq, u, d = 0, pos = FALSE, scale = FALSE,
  threshold = 0)

Arguments

n_d

list list of n-grams' sizes and distances between elements of n-gram. See Details.

seq

integer vector or matrix describing sequence(s).

unigrams (integer, numeric or character vector).

integer vector of distances between elements of n-gram (0 means consecutive elements). See Details.

pos

logical, if TRUE n_grams contains position information.

scale

logical, if TRUE output data is normalized. Should be used only for n-grams without position information. See Details.

threshold

integer, if not equal to 0, data is binarized into two groups (larger or equal to threshold, smaller than threshold).

Value

a integer matrix with named columns. The naming conventions are the same as in count_ngrams.

Details

Each element of n_d is a list consisting of two vectors. First element is a single integer value which determines the number of words in n-gram (equivalent of n from count_ngrams). Second element must be an integer vector describing distances between words in n-gram (equivalent of d from count_ngrams).

Examples

Run this code

seqs <- matrix(sample(1L:4, 600, replace = TRUE), ncol = 50)
count_multigrams(list(list(2, 1), list(2, 3)), seqs, 1L:4, pos = TRUE)

Run the code above in your browser using DataLab