Learn R Programming

RecordLinkage (version 0.3-2)

epiClassify: Classify record pairs with EpiLink weights

Description

Classifies record pairs as link, non-link or possible link based on weights computed by epiWeights and the thresholds passed as arguments.

Usage

epiClassify(rpairs, threshold.upper, threshold.lower = threshold.upper,
  ...)

## S3 method for class 'RecLinkData':
epiClassify(rpairs, threshold.upper, threshold.lower = threshold.upper)

## S3 method for class 'RLBigData':
epiClassify(rpairs, threshold.upper, threshold.lower = threshold.upper,
  e = 0.01, f = getFrequencies(rpairs), withProgressBar = (sink.number()==0))

Arguments

rpairs
RecLinkData object. Record pairs to be classified.
threshold.upper
A numeric value between 0 and 1.
threshold.lower
A numeric value between 0 and 1 lower than threshold.upper
e
Numeric vector. Estimated error rate(s).
f
Numeric vector. Average frequency of attribute values.
withProgressBar
Logical. Whether to display a progress bar.
...
Placeholder for optional arguments

Value

  • For the "RecLinkData" method, a S3 object of class "RecLinkResult" that represents a copy of newdata with element rpairs$prediction, which stores the classification result, as addendum. For the "RLBigData" method, a S4 object of class "RLResult".

Details

All record pairs with weights greater or equal threshold.upper are classified as links. Record pairs with weights smaller than threshold.upper and greater or equal threshold.lower are classified as possible links. All remaining records are classified as non-links. For the "RecLinkData" method, weights must have been calculated for rpairs using epiWeights. The "RLBigData" method checks if weights are present in the underlying database. If this is the case, classification is based on the existing weights. If not, weights are calculated on the fly during classification, but not stored. The latter behaviour might be preferable when a very large dataset is to be classified or disk space is limited (see also the notes to epiWeights). A progress bar is displayed by the "RLBigData" method only if weights are calculated on the fly and, by default, unless output is diverted by sink (e.g. in a Sweave script).

See Also

epiWeights

Examples

Run this code
# generate record pairs
data(RLdata500)
p=compare.dedup(RLdata500,strcmp=TRUE ,strcmpfun=levenshteinSim,
  identity=identity.RLdata500)

# calculate weights
p=epiWeights(p)

# classify and show results
summary(epiClassify(p,0.6))

Run the code above in your browser using DataLab