concordance: Qualitative examination of evidence

Description

This function uses quanteda::kwic() to return a concordance for a search pattern. The function takes as input three datasets and a pattern and returns a data frame with the hits labelled for authorship.

Usage

concordance(
  q.data,
  k.data,
  reference.data,
  search,
  token.type = "word",
  window = 5,
  case_insensitive = TRUE
)

Value

The function returns a data frame containing the concordances for the search pattern.

Arguments

q.data: A quanteda corpus object, such as the output of create_corpus(), or a tokens object with tokens being sentences, such as the output of tokenize_sents().
k.data: A quanteda corpus object, such as the output of create_corpus(), or a tokens object with tokens being sentences, such as the output of tokenize_sents().
reference.data: A quanteda corpus object, such as the output of create_corpus(), or a tokens object with tokens being sentences, such as the output of tokenize_sents(). This is optional.
search: A string. It can be any sequence of characters and it also accepts the use of * as a wildcard. The special tokens for sentence boundaries are 'BOS' for beginning of sentence and 'EOS' for end of sentence.
token.type: Choice between "word" (default), which searches for word or punctuation mark tokens, or "character", which instead uses a single character search.
window: The number of context items to be displayed around the keyword (a quanteda::kwic() parameter).
case_insensitive: Logical; if TRUE, ignore case (a quanteda::kwic() parameter).

Examples

Run this code

concordance(enron.sample[1], enron.sample[2], enron.sample[3], "wants to", token.type = "word")

#using wildcards
concordance(enron.sample[1], enron.sample[2], enron.sample[3], "wants * be", token.type = "word")

#searching character sequences with wildcards
concordance(enron.sample[1], enron.sample[2], enron.sample[3], "help*", token.type = "character")

#using sentences
enron.sents <- tokens(enron.sample, "sentence")
concordance(enron.sents[1], enron.sents[2], enron.sents[3], ". _EOS_", token.type = "word")

Run the code above in your browser using DataLab