CITAN (version 2014.12-1)

lbsFindDuplicateTitles: Find documents to be merged (**EXPERIMENTAL**)

Description

Indicates, by finding similarities between documents' titles, groups of documents that possibly should be merged.

Usage

lbsFindDuplicateTitles(conn, surveyDescription = NULL,
  ignoreTitles.like = NULL, aggressiveness = 1)

Arguments

conn
connection object, see lbsConnect.
surveyDescription
character string or NULL; survey description to restrict to or NULL.
ignoreTitles.like
character vector of SQL-LIKE patterns to match documents' titles to be ignored or NULL.
aggressiveness
nonnegative integer; 0 for showing only exact matches; the higher the value, the more documents will be proposed.

Value

  • A numeric vector of user-selected documents' identifiers to be removed.

Details

The function determines fuzzy similarity measures of the titles. Its specificity is controlled by the aggressiveness parameter.

Search results are presented in a convenient-to-use graphical dialog box. The function tries to order the groups of documents according to their relevance (**EXPERIMENTAL** algorithm). Note that the calculation often takes a few minutes!

The ignoreTitles.like parameter determines search patterns in an SQL LIKE format, i.e. an underscore _ matches a single character and a percent sign % matches any set of characters. The search is case-insensitive.

See Also

lbsDeleteDocuments, lbsFindDuplicateAuthors, lbsGetInfoDocuments

Examples

Run this code
conn <- lbsConnect("Bibliometrics.db");
## ...
listdoc <- lbsFindDuplicateTitles(conn,
   ignoreTitles.like=c("\%In this issue\%", "\%Editorial", "\%Introduction",
   "Letter to \%", "\%Preface"),
   aggressiveness=2);
lbsDeleteDocuments(conn, listdoc);
dbCommit(conn);
## ...

Run the code above in your browser using DataLab