lbsFindDuplicateAuthors: Find groups of authors to be merged (EXPERIMENTAL)

Description

Indicates, by finding similarities between authors' names, groups of authors that possibly should be merged.

Usage

lbsFindDuplicateAuthors(conn, names.like = NULL, ignoreWords = c("van",
  "von", "der", "no", "author", "name", "available"), minWordLength = 4,
  orderResultsBy = c("citations", "ndocuments", "name"), aggressiveness = 0)

Arguments

conn

connection object, see lbsConnect.

names.like

character vector of SQL-LIKE patterns that allow for restricting the search procedure to only given authors' names.

ignoreWords

character vector; words to be ignored.

minWordLength

numeric; minimal word length to be considered.

orderResultsBy

determines results' presentation order; one of citations, ndocuments name.

aggressiveness

nonnegative integer; controls the search depth.

Value

List of authors' identifiers to be merged. The first element of each vector is the one marked by the user as Parent, and the rest are the Children.

Details

The function uses a heuristic **EXPERIMENTAL** algorithm. Its behavior is controlled by the aggressiveness parameter.

Search results are presented in a convenient-to-use graphical dialog box. Note that the calculation often takes a few minutes!

The names.like parameter determines search patterns in an SQL LIKE format, i.e. an underscore _ matches a single character and a percent sign % matches any set of characters. The search is case-insensitive.

Examples

Run this code

conn <- lbsConnect("Bibliometrics.db");
## ...
listauth <- lbsFindDuplicateAuthors(conn,
   ignoreWords=c("van", "von", "der", "no", "author", "name", "available"),
   minWordLength=4,
   orderResultsBy=c("citations"),
   aggressiveness=1);
lbsMergeAuthors(conn, listauth);
dbCommit(conn);
## ...

Run the code above in your browser using DataLab