Learn R Programming

quanteda (version 0.9.2-0)

kwic: List key words in context from a text or a corpus of texts.

Description

For a text or a collection of texts (in a quanteda corpus object), return a list of a keyword supplied by the user in its immediate context, identifying the source text and the word index number within the source text. (Not the line number, since the text may or may not be segmented using end-of-line delimiters.)

Usage

kwic(x, keywords, window = 5, valuetype = c("glob", "regex", "fixed"),
  case_insensitive = TRUE, ...)

## S3 method for class 'character': kwic(x, keywords, window = 5, valuetype = c("glob", "regex", "fixed"), case_insensitive = TRUE, ...)

## S3 method for class 'corpus': kwic(x, keywords, window = 5, valuetype = c("glob", "regex", "fixed"), case_insensitive = TRUE, ...)

## S3 method for class 'tokenizedTexts': kwic(x, keywords, window = 5, valuetype = c("glob", "regex", "fixed"), case_insensitive = TRUE, ...)

## S3 method for class 'kwic': print(x, ...)

Arguments

x
a text character scalar or a quanteda corpus
keywords
A keyword or phrase consisting of multiple keywords, possibly including punctuation. If a phrase, keywords will be tokenized using the ... options.
window
The number of context words to be displayed around the keyword.
valuetype
how to interpret keyword expressions: "glob" for "glob"-style wildcard expressions; "regex" for regular expressions; or "fixed" for exact matching (entire words, for instance). If "fixed" is used wit
case_insensitive
match without respect to case if TRUE
...
additional arguments passed to tokenize, for applicable methods

Value

  • A kwic object classed data.frame, with the context before (preword), the keyword in its original format (word, preserving case and attached punctuation), the context after (postword), and the index position of the match (position). The rows of the dataframe will be named with the word index position, or the text name and the index position for a corpus object.

Examples

Run this code
head(kwic(inaugTexts, "secure*", window = 3, valuetype = "glob"))
head(kwic(inaugTexts, "secur", window = 3, valuetype = "regex"))
head(kwic(inaugTexts, "security", window = 3, valuetype = "fixed"))

kwic(inaugCorpus, "war against")
kwic(inaugCorpus, "war against", valuetype = "regex")

Run the code above in your browser using DataLab