lbsFindDuplicateAuthors: Suggest groups of authors to be merged (EXPERIMENTAL)

Description

This function suggests the user some groups of authors that possibly should be merged. It bases on authors' names similarity comparisons.

Usage

lbsFindDuplicateAuthors(conn, names.like, ignoreWords=c("van", "von",
    "der", "no", "author", "name", "available"), minWordLength=4,
    orderResultsBy=c("citations", "ndocuments", "name"),
    aggressiveness=0)

Arguments

conn

a connection object as produced by lbsConnect.

names.like

a character vector of SQL-LIKE patterns that allow for restricting the search procedure to only given authors' names.

ignoreWords

character vector; words to be ignored.

minWordLength

numeric; minimal word length to be considered.

orderResultsBy

determines results' presentation order; one of citations, ndocuments name.

aggressiveness

nonnegative integer; controls the search depth.

Value

List of authors' identifiers to be merged. The first element of each vector is the one marked by the user as Parent, and the rest are the Children.

Details

It uses a heuristic **EXPERIMENTAL** algorithm, which behavior is controlled by the aggressiveness parameter.

The search results are presented in a convenient-to-use graphical dialog box. Note that the calculation may take a few minutes!

names.like is a set of search patterns in an SQL LIKE format, i.e. an underscore _ matches a single character and a percent sign % matches any set of characters. The search is case-insensitive.

Examples

Run this code

conn <- lbsConnect("Bibliometrics.db");
## ...
listauth <- lbsFindDuplicateAuthors(conn,
ignoreWords=c("van", "von", "der", "no", "author", "name", "available"),
minWordLength=4,
orderResultsBy=c("citations"),
aggressiveness=1);
lbsMergeAuthors(conn, listauth);
dbCommit(conn);
## ...

Run the code above in your browser using DataLab