Classifies data pairs to which weights were assigned by emWeights
.
Based on user-defined thresholds or predefined error rates.
emClassify(rpairs, threshold.upper = Inf,
threshold.lower = threshold.upper, my = Inf, ny = Inf, ...) # S4 method for RecLinkData,ANY,ANY
emClassify(rpairs, threshold.upper = Inf,
threshold.lower = threshold.upper, my = Inf, ny = Inf)
# S4 method for RLBigData,ANY,ANY
emClassify(rpairs, threshold.upper = Inf,
threshold.lower = threshold.upper, my = Inf, ny = Inf,
withProgressBar = (sink.number()==0))
RecLinkData
object with weight information.
A probability. Error bound for false positives.
A probability. Error bound for false negatives.
A numeric value. Threshold for links.
A numeric value. Threshold for possible links.
Whether to display a progress bar
Placeholder for method-specific arguments.
For the "RecLinkData"
method, a S3 object
of class "RecLinkResult"
that represents a copy
of newdata
with element rpairs$prediction
, which stores
the classification result, as addendum.
For the "'>RLBigData"
method, a S4 object of class
"'>RLResult"
.
Two general approaches are implemented. The classical procedure
by Fellegi and Sunter (see references) minimizes the number of
possible links with given error levels for false links (my
) and
false non-links (ny
).
The second approach requires thresholds for links and possible links to be set by the user. A pair with weight \(w\) is classified as a link if \(w\geq \textit{threshold.upper}\), as a possible link if \(\textit{threshold.upper}\geq w\geq \textit{threshold.lower}\) and as a non-link if \(w<\textit{threshold.lower}\).
If threshold.upper
or threshold.lower
is given, the
threshold-based approach is used, otherwise, if one of the error bounds is
given, the Fellegi-Sunter model. If only my
is supplied, links are
chosen to meet the error bound and all other pairs are classified as non-links
(the equivalent case holds if only ny
is specified). If no further arguments
than rpairs
are given, a single threshold of 0 is used.
Ivan P. Fellegi, Alan B. Sunter: A Theory for Record Linkage, in: Journal of the American Statistical Association Vol. 64, No. 328 (Dec., 1969), pp. 1183--1210.
getPairs
to produce output from which thresholds can
be determined conveniently.