lbsFindDuplicateTitles: Suggest documents to be merged (EXPERIMENTAL)

Description

This function suggests the user some groups of documents that possibly should be merged. It is based on documents' titles similarity comparisons. It uses a heuristic algorithm, which behavior is controlled by the aggressiveness parameter.

Usage

lbsFindDuplicateTitles(conn, surveyDescription, ignoreTitles.like,
    aggressiveness=1)

Arguments

conn

a connection object as produced by lbsConnect.

surveyDescription

single character string or NULL; survey description to restrict to or NULL.

ignoreTitles.like

a character vector of SQL-LIKE patterns to match documents' titles to be ignored or NULL.

aggressiveness

nonnegative integer; 0 for showing only exact matches; the higher the value, the more documents will be proposed.

Value

A numeric vector of user-selected documents' identifiers to be removed.

Details

The search results are presented in a convenient-to-use graphical dialog box. The function tries to order the groups of documents according to their relevance (**EXPERIMENTAL** algorithm). Note that the calculation may take a few minutes!

ignoreTitles.like is a set of search patterns in an SQL LIKE format, i.e. an underscore _ matches a single character and a percent sign % matches any set of characters. The search is case-insensitive.

Examples

Run this code

conn <- lbsConnect("Bibliometrics.db");
## ...
listdoc <- lbsFindDuplicateTitles(conn,
ignoreTitles.like=c("%In this issue%", "%Editorial", "%Introduction",
"Letter to %", "%Preface"),
aggressiveness=2);
lbsDeleteDocuments(conn, listdoc);
dbCommit(conn);
## ...

Run the code above in your browser using DataLab