lbsFindDuplicateTitles: Find documents to be merged (EXPERIMENTAL)

Description

Indicates, by finding similarities between documents' titles, groups of documents that possibly should be merged.

Usage

lbsFindDuplicateTitles(conn, surveyDescription = NULL,
  ignoreTitles.like = NULL, aggressiveness = 1)

Arguments

conn

connection object, see lbsConnect.

surveyDescription

character string or NULL; survey description to restrict to or NULL.

ignoreTitles.like

character vector of SQL-LIKE patterns to match documents' titles to be ignored or NULL.

aggressiveness

nonnegative integer; 0 for showing only exact matches; the higher the value, the more documents will be proposed.

Value

A numeric vector of user-selected documents' identifiers to be removed.

Details

The function determines fuzzy similarity measures of the titles. Its specificity is controlled by the aggressiveness parameter.

Search results are presented in a convenient-to-use graphical dialog box. The function tries to order the groups of documents according to their relevance (**EXPERIMENTAL** algorithm). Note that the calculation often takes a few minutes!

The ignoreTitles.like parameter determines search patterns in an SQL LIKE format, i.e. an underscore _ matches a single character and a percent sign % matches any set of characters. The search is case-insensitive.

Examples

Run this code

# NOT RUN {
conn <- lbsConnect("Bibliometrics.db");
## ...
listdoc <- lbsFindDuplicateTitles(conn,
   ignoreTitles.like=c("\%In this issue\%", "\%Editorial", "\%Introduction",
   "Letter to \%", "\%Preface"),
   aggressiveness=2);
lbsDeleteDocuments(conn, listdoc);
dbCommit(conn);
## ...
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab