Detect and Remove Duplicate Records
detect_dupes(results, method = "exact", similarity_threshold = 0.85)
Data frame with duplicates marked and removed
Standardized search results data frame
Method for duplicate detection ("exact", "fuzzy", "doi")
Threshold for fuzzy matching (0-1)
This function provides three methods for duplicate detection:
exact: Matches on title and first 100 characters of abstract
fuzzy: Uses Jaro-Winkler string distance for similarity matching
doi: Matches based on cleaned DOI strings
For fuzzy matching, similarity_threshold should be between 0 and 1, where 1 means identical strings. A threshold of 0.85 typically works well for academic titles.