fetchAnnotations,KorAPQuery-method: Fetch annotations for all collected matches

Description

Usage

# S4 method for KorAPQuery
fetchAnnotations(
  kqo,
  foundry = "tt",
  overwrite = FALSE,
  verbose = kqo@korapConnection@verbose
)

Value

The updated kqo object with annotation columns like pos, lemma, morph (and atokens and annotation_snippet) in the @collectedMatches slot. Each column is a data frame with left, match, and right columns containing list vectors of annotations for the left context, matched tokens, and right context, respectively. The original XML snippet for each match is also stored in annotation_snippet.

Arguments

kqo: object obtained from corpusQuery() with collected matches. Note: the original corpus query should have metadataOnly = FALSE for annotation parsing to work.
foundry: string specifying the foundry to use for annotations (default: "tt" for Tree-Tagger)
overwrite: logical; if TRUE, re-fetch and replace any existing annotation columns. If FALSE (default), only add missing annotation layers and preserve already fetched ones (e.g., keep POS/lemma from a previous foundry while adding morph from another).
verbose: print progress information if true

Details

fetchAnnotations fetches annotations (only token annotations, for now) for all matches in the @collectedMatches slot of a KorAPQuery object and adds annotation columns directly to the @collectedMatches data frame. The method uses the matchID from collected matches.

Important: For copyright-restricted corpora, users must be authorized via auth() and the initial corpus query must have metadataOnly = FALSE to ensure snippets are available for annotation parsing.

The method parses XML snippet annotations and adds linguistic columns to the data frame:

pos: data frame with left, match, right columns, each containing list vectors of part-of-speech tags
lemma: data frame with left, match, right columns, each containing list vectors of lemmas
morph: data frame with left, match, right columns, each containing list vectors of morphological tags
atokens: data frame with left, match, right columns, each containing list vectors of token text (from annotations)
annotation_snippet: original XML snippet from the annotation API

Examples

Run this code

if (FALSE) {

# Fetch annotations for matches using Tree-Tagger foundry
# Note: Authorization required for copyright-restricted corpora
q <- KorAPConnection() |>
  auth() |>
  corpusQuery("Ameisenplage", metadataOnly = FALSE) |>
  fetchNext(maxFetch = 10) |>
  fetchAnnotations()

# Access linguistic annotations for match i:
pos_tags <- q@collectedMatches$pos
# Data frame with left/match/right columns for POS tags
lemmas <- q@collectedMatches$lemma
# Data frame with left/match/right columns for lemmas
morphology <- q@collectedMatches$morph
# Data frame with left/match/right columns for morphological tags
atokens <- q@collectedMatches$atokens
# Data frame with left/match/right columns for annotation token text
# Original XML snippet for match i
raw_snippet <- q@collectedMatches$annotation_snippet[[i]]

# Access specific components:
# POS tags for the matched tokens in match i
match_pos <- q@collectedMatches$pos$match[[i]]
# Lemmas for the left context in match i
left_lemmas <- q@collectedMatches$lemma$left[[i]]
 # Token text for the right context in match i
right_tokens <- q@collectedMatches$atokens$right[[i]]

# Use a different foundry (e.g., MarMoT)
q <- KorAPConnection() |>
  auth() |>
  corpusQuery("Ameisenplage", metadataOnly = FALSE) |>
  fetchNext(maxFetch = 10) |>
  fetchAnnotations(foundry = "marmot")
q@collectedMatches$pos$left[1] # POS tags for the left context of the first match
}