Last chance! 50% off unlimited learning
Sale ends in
Given a list of duplicate entries and a data set, this function extracts only unique references.
extract_unique_references(data, matches, type = "merge")
A data.frame
containing bibliographic information.
A vector showing which entries in data
are duplicates.
How should entries be selected to retain? Default is "merge"
which selects the entries with the largest number of characters in each column. Alternatively "select"
which returns the row with the highest total number of characters.
Returns a data.frame
of unique references.
# NOT RUN {
my_df <- data.frame(
title = c(
"EviAtlas: a tool for visualising evidence synthesis databases",
"revtools: An R package to support article screening for evidence synthesis",
"An automated approach to identifying search terms for systematic reviews",
"Reproducible, flexible and high-throughput data extraction from primary literature",
"eviatlas:tool for visualizing evidence synthesis databases.",
"REVTOOLS a package to support article-screening for evidence synthsis"
),
year = c("2019", "2019", "2019", "2019", NA, NA),
authors = c("Haddaway et al", "Westgate",
"Grames et al", "Pick et al", NA, NA),
stringsAsFactors = FALSE
)
# run deduplication
dups <- find_duplicates(
my_df$title,
method = "string_osa",
rm_punctuation = TRUE,
to_lower = TRUE
)
extract_unique_references(my_df, matches = dups)
# or, in one line:
deduplicate(my_df, "title",
method = "string_osa",
rm_punctuation = TRUE,
to_lower = TRUE)
# }
Run the code above in your browser using DataLab