
Search duplicated records in a dataframe.
duplicatedMatching(M, Field = "TI", tol = 0.95)
is the bibliographic data frame.
is a character object. It indicates one of the field tags used to identify duplicated records. Field can be equal to one of this tags: TI (title), AB (abstract), UT (manuscript ID).
is a numeric value giving the minimum relative similarity to match two manuscripts. Default value is tol = 0.95
.
the value returned from duplicatedMatching
is a data frame without duplicated records.
A bibliographic data frame is obtained by the converting function convert2df
.
It is a data matrix with cases corresponding to manuscripts and variables to Field Tag in the original SCOPUS and Thomson Reuters' ISI Web of Knowledge file.
The function identifies duplicated records in a bibliographic data frame and deletes them.
Duplicate entries are identified through the restricted Damerau-Levenshtein distance.
Two manuscripts that have a relative similarity measure greater than tol
argument are stored in the output data frame only once.
convert2df
to import and convert an ISI or SCOPUS Export file in a bibliographic data frame.
biblioAnalysis
function for bibliometric analysis.
summary
to obtain a summary of the results.
plot
to draw some useful plots of the results.
# NOT RUN {
data(scientometrics)
M=rbind(scientometrics[1:20,],scientometrics[10:30,])
newM <- duplicatedMatching(M, Field = "TI", tol = 0.95)
dim(newM)
# }
Run the code above in your browser using DataLab