Identify context words using user-provided patterns.
textstat_context(
x,
pattern,
valuetype = c("glob", "regex", "fixed"),
case_insensitive = TRUE,
window = 10,
min_count = 10,
remove_pattern = TRUE,
n = 1,
skip = 0,
...
)char_context(
x,
pattern,
valuetype = c("glob", "regex", "fixed"),
case_insensitive = TRUE,
window = 10,
min_count = 10,
remove_pattern = TRUE,
p = 0.001,
n = 1,
skip = 0
)
a tokens object created by quanteda::tokens().
quanteda::pattern() to specify target words.
the type of pattern matching: "glob" for "glob"-style
wildcard expressions; "regex" for regular expressions; or "fixed" for
exact matching. See quanteda::valuetype() for details.
if TRUE, ignore case when matching.
size of window for collocation analysis.
minimum frequency of words within the window to be considered as collocations.
if TRUE, keywords do not contain target words.
integer vector specifying the number of elements to be concatenated in each n-gram. Each element of this vector will define a \(n\) in the \(n\)-gram(s) that are produced.
integer vector specifying the adjacency skip size for tokens
forming the n-grams, default is 0 for only immediately neighbouring words.
For skipgrams, skip can be a vector of integers, as the
"classic" approach to forming skip-grams is to set skip = \(k\) where
\(k\) is the distance for which \(k\) or fewer skips are used to
construct the \(n\)-gram. Thus a "4-skip-n-gram" defined as skip = 0:4 produces results that include 4 skips, 3 skips, 2 skips, 1 skip, and 0
skips (where 0 skips are typical n-grams formed from adjacent words). See
Guthrie et al (2006).
additional arguments passed to quanteda.textstats::textstat_keyness().
threshold for statistical significance of collocations.
quanteda.textstats::textstat_keyness()