Learn R Programming

RecordLinkage (version 0.3-2)

getPairs: Extract Record Pairs

Description

Extracts record pairs from data and result objects.

Usage

## S3 method for class 'RecLinkData':
getPairs(object, max.weight = Inf, min.weight = -Inf,
         single.rows = FALSE, show = "all", sort = !is.null(object$Wdata))

## S3 method for class 'RLBigData': getPairs(object, max.weight = Inf, min.weight = -Inf, filter.match = c("match", "unknown", "nonmatch"), withWeight = dbExistsTable(object@con, "Wdata"), withMatch = TRUE, single.rows = FALSE, sort = TRUE)

## S3 method for class 'RLResult': getPairs(object, filter.match = c("match", "unknown", "nonmatch"), filter.link = c("nonlink", "possible", "link"), max.weight = Inf, min.weight = -Inf, withMatch = TRUE, withClass = TRUE, withWeight = dbExistsTable(object@data@con, "Wdata"), single.rows = FALSE, sort = withWeight)

getFalsePos(object, single.rows = FALSE) getFalseNeg(object, single.rows = FALSE) getFalse(object, single.rows = FALSE)

Arguments

object
The data or result object from which to extract record pairs.
max.weight, min.weight
Real numbers. Upper and lower weight threshold.
filter.match
Character vector, a nonempty subset of c("match", "nonmatch", "unkown") denoting which pairs to allow in the output.
filter.link
Character vector, a nonempty subset of c("link", "nonlink", "unkown") denoting which pairs to allow in the output.
withWeight
Logical. Whether to include linkage weights in the output.
withMatch
Logical. Whether to include matching status in the output.
withClass
Logical. Whether to include classification result in the output.
single.rows
Logical. Wether to print record pairs in one row instead of two consecutive rows.
show
Character. Selects which records to show, one of "links", "nonlinks", "possible", "all".
sort
Logical. Whether to sort descending by weight.

Value

  • A data frame. If single.rows is TRUE, each row holds (in this order) id and data fields of the first record, id and data fields of the second record and possibly matching status, classification result and/or weight.

    If single.rows is not TRUE, the result holds for each resulting record pair consecutive rows of the following format:

    1. ID and data fields of the first record followed by as many empty fields to match the length of the following line.

  • ID and data fields of the second record, possibly followed by matching status, classification result and/or weight.
  • A blank line to separate record pairs.

code

"RLResult"

Details

These methods extract record pairs from "RecLinkData", or "RecLinkResult", "RLBigData" and "RLResult" objects. Possible applications are retreiving a linkage result for further processing, conducting a manual review in order to determine classification thresholds or inspecting misclassified pairs. The various arguments can be grouped by the following purposes:
  1. Controlling which record pairs are included in the output:min.weightandmax.weight,filter.match,filter.link,show.
Controlling which information is shown: withWeight, withMatch, withClass Cotrolling the overall structure of the result: sort, single.rows.

Examples

Run this code
data(RLdata500)

# create record pairs and calculate epilink weights
rpairs <- RLBigDataDedup(RLdata500, identity = identity.RLdata500,
  blockfld=list(1,3,5,6,7))
rpairs <- epiWeights(rpairs)

# show all record pairs with weights between 0.5 and 0.6
getPairs(rpairs, min.weight=0.5, max.weight=0.6)

# show only matches with weight <= 0.5
getPairs(rpairs, max.weight=0.5, filter.match="match")

# classify with one threshold
result <- epiClassify(rpairs, 0.5)

# show all links, do not show classification in the output
getPairs(result, filter.link="link", withClass = FALSE)

# see wrongly classified pairs
getFalsePos(result)
getFalseNeg(result)

Run the code above in your browser using DataLab