CITAN (version 2011.08-1)

lbsFindDuplicateAuthors: Suggest groups of authors to be merged (**EXPERIMENTAL**)

Description

This function suggests the user some groups of authors that possibly should be merged. It bases on authors' names similarity comparisons.

Usage

lbsFindDuplicateAuthors(conn, names.like, ignoreWords=c("van", "von",
    "der", "no", "author", "name", "available"), minWordLength=4,
    orderResultsBy=c("citations", "ndocuments", "name"),
    aggressiveness=0)

Arguments

conn
a connection object as produced by lbsConnect.
names.like
a character vector of SQL-LIKE patterns that allow for restricting the search procedure to only given authors' names.
ignoreWords
character vector; words to be ignored.
minWordLength
numeric; minimal word length to be considered.
orderResultsBy
determines results' presentation order; one of citations, ndocuments name.
aggressiveness
nonnegative integer; controls the search depth.

Value

  • List of authors' identifiers to be merged. The first element of each vector is the one marked by the user as Parent, and the rest are the Children.

Details

It uses a heuristic **EXPERIMENTAL** algorithm, which behavior is controlled by the aggressiveness parameter.

The search results are presented in a convenient-to-use graphical dialog box. Note that the calculation may take a few minutes!

names.like is a set of search patterns in an SQL LIKE format, i.e. an underscore _ matches a single character and a percent sign % matches any set of characters. The search is case-insensitive.

See Also

lbsMergeAuthors, lbsFindDuplicateTitles, lbsGetInfoAuthors

Examples

Run this code
conn <- lbsConnect("Bibliometrics.db");
## ...
listauth <- lbsFindDuplicateAuthors(conn,
ignoreWords=c("van", "von", "der", "no", "author", "name", "available"),
minWordLength=4,
orderResultsBy=c("citations"),
aggressiveness=1);
lbsMergeAuthors(conn, listauth);
dbCommit(conn);
## ...

Run the code above in your browser using DataLab