quanteda (version 0.99.22)

kwic: locate keywords-in-context


For a text or a collection of texts (in a quanteda corpus object), return a list of a keyword supplied by the user in its immediate context, identifying the source text and the word index number within the source text. (Not the line number, since the text may or may not be segmented using end-of-line delimiters.)


kwic(x, pattern, window = 5, valuetype = c("glob", "regex", "fixed"),
  case_insensitive = TRUE, join = FALSE, ...)


# S3 method for kwic as.tokens(x, ...)



a character, corpus, or tokens object


a character vector, list of character vectors, dictionary, collocations, or dfm. See pattern for details.


the number of context words to be displayed around the keyword.


the type of pattern matching: "glob" for "glob"-style wildcard expressions; "regex" for regular expressions; or "fixed" for exact matching. See valuetype for details.


match without respect to case if TRUE


join adjacent keywords in the concordance view if TRUE


additional arguments passed to tokens, for applicable object types


A kwic classed data.frame, with the document name (docname), the token index positions (from and to, which will be the same for single-word patterns, or a sequence equal in length to the number of elements for multi-word phrases), the context before (pre), the keyword in its original format (keyword, preserving case and attached punctuation), and the context after (post). The return object has its own print method, plus some special attributes that are hidden in the print view. If you want to turn this into a simple data.frame, simply wrap the result in data.frame.

as.tokens.kwic converts the kwic object into a tokens object, with each new "document" consisting of one keyword match, and the contents of the pre, keyword, and post fields forming the tokens. This is one way to save the output for subsequent usage; another way is to form a corpus from the return object.


Run this code
head(kwic(data_corpus_inaugural, "secure*", window = 3, valuetype = "glob"))
head(kwic(data_corpus_inaugural, "secur", window = 3, valuetype = "regex"))
head(kwic(data_corpus_inaugural, "security", window = 3, valuetype = "fixed"))

toks <- tokens(data_corpus_inaugural)
kwic(data_corpus_inaugural, phrase("war against"))
kwic(data_corpus_inaugural, phrase("war against"), valuetype = "regex")

mykwic <- kwic(data_corpus_inaugural, "provident*")
is.kwic("Not a kwic")
# }

Run the code above in your browser using DataLab