CITAN (version 2011.08-1)

lbsFindDuplicateTitles: Suggest documents to be merged (**EXPERIMENTAL**)

Description

This function suggests the user some groups of documents that possibly should be merged. It is based on documents' titles similarity comparisons. It uses a heuristic algorithm, which behavior is controlled by the aggressiveness parameter.

Usage

lbsFindDuplicateTitles(conn, surveyDescription, ignoreTitles.like,
    aggressiveness=1)

Arguments

conn
a connection object as produced by lbsConnect.
surveyDescription
single character string or NULL; survey description to restrict to or NULL.
ignoreTitles.like
a character vector of SQL-LIKE patterns to match documents' titles to be ignored or NULL.
aggressiveness
nonnegative integer; 0 for showing only exact matches; the higher the value, the more documents will be proposed.

Value

  • A numeric vector of user-selected documents' identifiers to be removed.

Details

The search results are presented in a convenient-to-use graphical dialog box. The function tries to order the groups of documents according to their relevance (**EXPERIMENTAL** algorithm). Note that the calculation may take a few minutes!

ignoreTitles.like is a set of search patterns in an SQL LIKE format, i.e. an underscore _ matches a single character and a percent sign % matches any set of characters. The search is case-insensitive.

See Also

lbsDeleteDocuments, lbsFindDuplicateAuthors, lbsGetInfoDocuments

Examples

Run this code
conn <- lbsConnect("Bibliometrics.db");
## ...
listdoc <- lbsFindDuplicateTitles(conn,
ignoreTitles.like=c("%In this issue%", "%Editorial", "%Introduction",
"Letter to %", "%Preface"),
aggressiveness=2);
lbsDeleteDocuments(conn, listdoc);
dbCommit(conn);
## ...

Run the code above in your browser using DataLab