quanteda (version 2.1.2)

phrase: Declare a compound character to be a sequence of separate pattern matches

Description

Declares that a whitespace-separated expression consists of multiple patterns, separated by whitespace. This is typically used as a wrapper around pattern() to make it explicit that the pattern elements are to be used for matches to multi-word sequences, rather than individual, unordered matches to single words.

Usage

phrase(x)

is.phrase(x)

Arguments

x

the sequence, as a character object containing whitespace separating the patterns

Value

phrase returns a specially classed list whose white-spaced elements have been parsed into separate character elements.

is.phrase returns TRUE if the object was created by phrase(); FALSE otherwise.

Examples

Run this code
# NOT RUN {
# make phrases from characters
phrase(c("a b", "c d e", "f"))

# from a dictionary
phrase(dictionary(list(catone = c("a b"), cattwo = "c d e", catthree = "f")))

# from a collocations object
(coll <- textstat_collocations(tokens("a b c a b d e b d a b")))
phrase(coll)
# }

Run the code above in your browser using DataCamp Workspace