Learn R Programming

stringdist (version 0.8.1)

qgrams: Get a table of qgram counts from one or more character vectors.

Description

Get a table of qgram counts from one or more character vectors.

Usage

qgrams(..., .list = NULL, q = 1L, useBytes = FALSE,
  useNames = !useBytes)

Arguments

...
any number of (named) arguments, that will be coerced to character with as.character.
q
size of q-gram, must be non-negative.
useBytes
Determine byte-wise qgrams. useBytes=TRUE is faster but may yield different results depending on character encoding. For ASCII it is identical. See also stringdist under Encodin
useNames
Add q-grams as column names. If useBytes=useNames=TRUE, the q-byte sequences are represented as 2 hexadecimal numbers per byte, separated by a vertical bar (|).
.list
Will be concatenated with the ... argument(s). Usefull for adding character vectors named 'q' or 'useNames'.

Value

  • A table with $q$-gram counts. Detected $q$-grams are column names and the argument names as row names. If no argument names were provided, they will be generated.

Details

The input is converted to character. If useBytes=TRUE, each element is converted to utf8 and then to integer as in stringdist. Next,the data is passed to the underlying routine.

Strings with less than q characters and elements containing NA are skipped. Using q=0 therefore counts the number of empty strings "" occuring in each argument.

See Also

stringdist, amatch

Examples

Run this code
qgrams('hello world',q=3)

# q-grams are counted uniquely over a character vector
qgrams(rep('hello world',2),q=3)

# to count them separately, do something like
x <- c('hello', 'world')
lapply(x,qgrams, q=3)

# output rows may be named, and you can pass any number of character vectors
x <- "I will not buy this record, it is scratched"
y <- "My hovercraft is full of eels"
z <- c("this", "is", "a", "dead","parrot")
qgrams(A = x, B = y, C = z,q=2)

# a tonque twister, showing the effects of useBytes and useNames
x <- "peter piper picked a peck of pickled peppers"
qgrams(x, q=2) 
qgrams(x, q=2, useNames=FALSE) 
qgrams(x, q=2, useBytes=TRUE)
qgrams(x, q=2, useBytes=TRUE, useNames=TRUE)

Run the code above in your browser using DataLab