Given a bibliometrix collection produced by convert2df, this
function takes the subset of records that have a DOI but are missing one
or more of the analysis-relevant fields, queries the Crossref REST API
(https://api.crossref.org/works) and/or OpenAlex (via
openalexR) using the DOI as the lookup key, and fills the gaps
with the values returned by those sources. Existing non-empty values
are never overwritten.
completeMetadata(
M,
sources = c("openalex", "crossref"),
fields = c("AB", "AU", "C1", "CR", "DT", "LA", "PY", "RP", "SO", "TC", "TI"),
email = NULL,
oa_apikey = NULL,
batch_size = 20,
max_records = Inf,
progress = NULL,
verbose = TRUE
)A list with components:
MThe enriched collection (same class as the input).
reportLong-format data.frame with one row per
(field, source) summarising attempts, fills, and failures.
beforeThe mandatoryTags table from
missingData(M) before enrichment.
afterThe mandatoryTags table from
missingData(M) after enrichment.
Provenance is attached to the returned collection as
attr(M, "enrichment"), a long-format data.frame with columns
SR, field, source, timestamp.
Bibliometrix data frame produced by convert2df.
Character vector of enrichment sources.
Default c("openalex", "crossref"). Order is irrelevant; OpenAlex
always runs before Crossref. "openalex" is skipped if
M$DB[1] == "OPENALEX".
Character vector of WoS-codified fields to attempt to fill.
Default c("AB","AU","C1","CR","DT","LA","PY","RP","SO","TC","TI").
TC is filled only by OpenAlex.
Optional contact email used as the Crossref polite-pool
identifier and OpenAlex mailto. If NULL, the function
falls back to the env var BIBLIOMETRIX_EMAIL or the persisted
file ~/.biblio_openalex_email.txt.
Optional OpenAlex API key. If NULL, the function
reads Sys.getenv("openalexR.apikey") and falls back to
~/.biblio_openalex_apikey.txt. The OpenAlex pass works without a
key (lower rate limit).
Number of DOIs per Crossref batch request (default 20). OpenAlex uses a fixed batch size of 50 (the maximum that keeps URLs under length limits).
Optional cap on the number of records to enrich
(useful for previewing). Default Inf.
Optional callback function(done, total, label)
invoked after each batch. Used by biblioshiny to drive a progress bar.
Logical. Print progress messages to the console.
When both sources are enabled, OpenAlex runs first (broader coverage of
AB/CR/C1/TC) and Crossref then fills the residual gaps. If the input
collection was originally imported from OpenAlex (M$DB[1] ==
"OPENALEX"), the OpenAlex pass is automatically skipped because re-querying
it would not add information.
The vacancy predicate matches the one used by missingData:
a cell is considered missing when it is NA or one of
c("", "NA", "none", "NA,0000,NA").
Crossref cannot supply author keywords (DE), Keywords Plus (ID), Web of
Science categories (WC), or citation counts (TC). OpenAlex covers TC,
AB, AU, C1, CR, DT, LA, PY, RP, SO, TI well; OpenAlex keywords
are AI-derived topic labels and not author keywords, so DE is off by
default. ID and WC are always skipped.
if (FALSE) {
data(scientometrics, package = "bibliometrixData")
res <- completeMetadata(scientometrics, email = "you@example.com")
res$report
res$after
}
Run the code above in your browser using DataLab