lbsFindDuplicateTitles

0th

Percentile

Find documents to be merged (**EXPERIMENTAL**)

Indicates, by finding similarities between documents' titles, groups of documents that possibly should be merged.

Usage
lbsFindDuplicateTitles(conn, surveyDescription = NULL, ignoreTitles.like = NULL, aggressiveness = 1)
Arguments
conn
connection object, see lbsConnect.
surveyDescription
character string or NULL; survey description to restrict to or NULL.
ignoreTitles.like
character vector of SQL-LIKE patterns to match documents' titles to be ignored or NULL.
aggressiveness
nonnegative integer; 0 for showing only exact matches; the higher the value, the more documents will be proposed.
Details

The function determines fuzzy similarity measures of the titles. Its specificity is controlled by the aggressiveness parameter.

Search results are presented in a convenient-to-use graphical dialog box. The function tries to order the groups of documents according to their relevance (**EXPERIMENTAL** algorithm). Note that the calculation often takes a few minutes!

The ignoreTitles.like parameter determines search patterns in an SQL LIKE format, i.e. an underscore _ matches a single character and a percent sign % matches any set of characters. The search is case-insensitive.

Value

A numeric vector of user-selected documents' identifiers to be removed.

See Also

lbsDeleteDocuments, lbsFindDuplicateAuthors, lbsGetInfoDocuments

Aliases
  • lbsFindDuplicateTitles
Examples
## Not run: 
# conn <- lbsConnect("Bibliometrics.db");
# ## ...
# listdoc <- lbsFindDuplicateTitles(conn,
#    ignoreTitles.like=c("\%In this issue\%", "\%Editorial", "\%Introduction",
#    "Letter to \%", "\%Preface"),
#    aggressiveness=2);
# lbsDeleteDocuments(conn, listdoc);
# dbCommit(conn);
# ## ...## End(Not run)

Documentation reproduced from package CITAN, version 2015.12-2, License: LGPL (>= 3)

Community examples

Looks like there are no examples yet.