Classifies data pairs to which weights were assigned by emWeights
.
Based on user-defined thresholds or predefined error rates.
emClassify(rpairs, threshold.upper = Inf,
threshold.lower = threshold.upper, my = Inf, ny = Inf, ...) # S4 method for RecLinkData,ANY,ANY
emClassify(rpairs, threshold.upper = Inf,
threshold.lower = threshold.upper, my = Inf, ny = Inf)
# S4 method for RLBigData,ANY,ANY
emClassify(rpairs, threshold.upper = Inf,
threshold.lower = threshold.upper, my = Inf, ny = Inf,
withProgressBar = (sink.number()==0))
For the "RecLinkData"
method, a S3 object
of class "RecLinkResult"
that represents a copy
of newdata
with element rpairs$prediction
, which stores
the classification result, as addendum.
For the "RLBigData"
method, a S4 object of class
"RLResult"
.
RecLinkData
object with weight information.
A probability. Error bound for false positives.
A probability. Error bound for false negatives.
A numeric value. Threshold for links.
A numeric value. Threshold for possible links.
Whether to display a progress bar
Placeholder for method-specific arguments.
Andreas Borg, Murat Sariyar
Two general approaches are implemented. The classical procedure
by Fellegi and Sunter (see references) minimizes the number of
possible links with given error levels for false links (my
) and
false non-links (ny
).
The second approach requires thresholds for links and possible links to be set by the user. A pair with weight \(w\) is classified as a link if \(w\geq \textit{threshold.upper}\), as a possible link if \(\textit{threshold.upper}\geq w\geq \textit{threshold.lower}\) and as a non-link if \(w<\textit{threshold.lower}\).
If threshold.upper
or threshold.lower
is given, the
threshold-based approach is used, otherwise, if one of the error bounds is
given, the Fellegi-Sunter model. If only my
is supplied, links are
chosen to meet the error bound and all other pairs are classified as non-links
(the equivalent case holds if only ny
is specified). If no further arguments
than rpairs
are given, a single threshold of 0 is used.
Ivan P. Fellegi, Alan B. Sunter: A Theory for Record Linkage, in: Journal of the American Statistical Association Vol. 64, No. 328 (Dec., 1969), pp. 1183--1210.
getPairs
to produce output from which thresholds can
be determined conveniently.