Learn R Programming

idiolect (version 1.1.1)

concordance: Qualitative examination of evidence

Description

This function uses quanteda::kwic() to return a concordance for a search pattern. The function takes as input three datasets and a pattern and returns a data frame with the hits labelled for authorship.

Usage

concordance(
  q.data,
  k.data,
  reference.data,
  search,
  token.type = "word",
  window = 5,
  case_insensitive = TRUE
)

Value

The function returns a data frame containing the concordances for the search pattern.

Arguments

q.data

A quanteda corpus object, such as the output of create_corpus(), or a tokens object with tokens being sentences, such as the output of tokenize_sents().

k.data

A quanteda corpus object, such as the output of create_corpus(), or a tokens object with tokens being sentences, such as the output of tokenize_sents().

reference.data

A quanteda corpus object, such as the output of create_corpus(), or a tokens object with tokens being sentences, such as the output of tokenize_sents(). This is optional.

search

A string. It can be any sequence of characters and it also accepts the use of * as a wildcard. The special tokens for sentence boundaries are 'BOS' for beginning of sentence and 'EOS' for end of sentence.

token.type

Choice between "word" (default), which searches for word or punctuation mark tokens, or "character", which instead uses a single character search.

window

The number of context items to be displayed around the keyword (a quanteda::kwic() parameter).

case_insensitive

Logical; if TRUE, ignore case (a quanteda::kwic() parameter).

Examples

Run this code
concordance(enron.sample[1], enron.sample[2], enron.sample[3], "wants to", token.type = "word")

#using wildcards
concordance(enron.sample[1], enron.sample[2], enron.sample[3], "wants * be", token.type = "word")

#searching character sequences with wildcards
concordance(enron.sample[1], enron.sample[2], enron.sample[3], "help*", token.type = "character")

#using sentences
enron.sents <- tokens(enron.sample, "sentence")
concordance(enron.sents[1], enron.sents[2], enron.sents[3], ". _EOS_", token.type = "word")

Run the code above in your browser using DataLab