Learn R Programming

contentanalysis (version 0.2.1)

match_citations_to_references: Match citations to references

Description

Matches in-text citations to entries in the reference list using author-year matching with multiple disambiguation strategies.

Usage

match_citations_to_references(citations_df, references_df)

Value

Tibble with matched citations including columns:

  • citation_id: Citation identifier

  • citation_text: Original citation text

  • citation_text_clean: Cleaned citation text

  • citation_type: Type of citation

  • cite_author: Extracted first author from citation

  • cite_second_author: Second author (if present)

  • cite_year: Extracted year

  • cite_has_etal: Logical, contains "et al."

  • matched_ref_id: ID of matched reference

  • ref_full_text: Full text of matched reference

  • ref_authors: Authors from reference

  • ref_year: Year from reference

  • match_confidence: Quality of match (high, medium, low, no_match)

Arguments

citations_df

Data frame with citation information, must include: citation_id, citation_text, citation_text_clean, citation_type

references_df

Data frame with parsed references from parse_references_section()

Details

Matching algorithm:

  1. Filter by exact year match

  2. Match first author (exact, then fuzzy)

  3. Disambiguate using second author or et al. heuristics

Match confidence levels include: high (exact first author + year), high_second_author (disambiguated with second author), medium_multiple_matches, medium_fuzzy, medium_etal_heuristic (various medium confidence scenarios), no_match_year, no_match_author, no_match_missing_info (no suitable reference found).

Examples

Run this code
if (FALSE) {
matched <- match_citations_to_references(citations_df, references_df)
table(matched$match_confidence)
}

Run the code above in your browser using DataLab