textstat_context: Identify context words

Description

Identify context words using user-provided patterns.

Usage

textstat_context(
  x,
  pattern,
  valuetype = c("glob", "regex", "fixed"),
  case_insensitive = TRUE,
  window = 10,
  min_count = 10,
  remove_pattern = TRUE,
  n = 1,
  skip = 0,
  ...
)
char_context(
  x,
  pattern,
  valuetype = c("glob", "regex", "fixed"),
  case_insensitive = TRUE,
  window = 10,
  min_count = 10,
  remove_pattern = TRUE,
  p = 0.001,
  n = 1,
  skip = 0
)

Arguments

x: a tokens object created by quanteda::tokens().
pattern: quanteda::pattern() to specify target words.
valuetype: the type of pattern matching: "glob" for "glob"-style wildcard expressions; "regex" for regular expressions; or "fixed" for exact matching. See quanteda::valuetype() for details.
case_insensitive: if TRUE, ignore case when matching.
window: size of window for collocation analysis.
min_count: minimum frequency of words within the window to be considered as collocations.
remove_pattern: if TRUE, keywords do not contain target words.
n: integer vector specifying the number of elements to be concatenated in each n-gram. Each element of this vector will define a \(n\) in the \(n\)-gram(s) that are produced.
skip: integer vector specifying the adjacency skip size for tokens forming the n-grams, default is 0 for only immediately neighbouring words. For skipgrams, skip can be a vector of integers, as the "classic" approach to forming skip-grams is to set skip = \(k\) where \(k\) is the distance for which \(k\) or fewer skips are used to construct the \(n\)-gram. Thus a "4-skip-n-gram" defined as skip = 0:4 produces results that include 4 skips, 3 skips, 2 skips, 1 skip, and 0 skips (where 0 skips are typical n-grams formed from adjacent words). See Guthrie et al (2006).
...: additional arguments passed to quanteda.textstats::textstat_keyness().
p: threshold for statistical significance of collocations.

Description

Usage

Arguments

See Also