Weights must have been calculated for rpairs
, for example by
emWeights
or epiWeights
.
The true match result must be known for rpairs
, mostly this is provided
through the identity
argument of compare.*
For the following, it is assumed that all records with weights greater than or
equal to the threshold are classified as links, the remaining as non-links.
If no further arguments are given, a threshold which minimizes the
absolute number of misclassified record pairs is returned. If my
is
supplied (ny
is ignored in this case), a threshold is picked which
maximizes the number of correctly classified links while keeping the ratio
of false links to the total number of links below or equal my
.
If ny
is supplied, the number of correct non-links is maximized under the
condition that the ratio of falsely classified non-links to the total number of
non-links does not exceed ny
.
Two seperate runs of optimalThreshold
with values for my
and
ny
respectively allow for obtaining a lower and an upper threshold
for a three-way classification approach (yielding links, non-links and
possible links).