## S3 method for class 'RecLinkData':
fsWeights(rpairs, m = 0.95, u = rpairs$frequencies, cutoff = 1)
## S3 method for class 'RLBigData':
fsWeights(rpairs, m=0.95, u=getFrequencies(rpairs),
cutoff=1, withProgressBar = (sink.number()==0))
## S3 method for class 'RecLinkData':
fsClassify(rpairs, ...)
## S3 method for class 'RLBigData':
fsClassify(rpairs, threshold.upper,
threshold.lower=threshold.upper, m=0.95,
u=getFrequencies(rpairs), withProgressBar = (sink.number()==0),
cutoff=1)
threshold.upper
.fsWeights
returns a copy of the object with the calculated weights
added. Note that "RLBigData "
objects have some
reference-style semantics, see clone for more information.
For the "RecLinkData"
method, fsClassify
returns a S3 object
of class "RecLinkResult"
that represents a copy
of newdata
with element rpairs$prediction
, which stores
the classification result, as addendum.
For the "RLBigData "
method, fsClassify
returns
a S4 object of class "RLResult "
.fsWeights
calculates matching weights on an object based on the
specified m- and u-probabilites. Each of m
and u
can be a
numeric vector or a single number in the range $[0, 1]$.
fsClassify
performs classification based on the calculated weights.
All record pairs with weights greater or
equal threshold.upper
are classified as links. Record pairs with
weights smaller than threshold.upper
and greater or equal
threshold.lower
are classified as possible links. All remaining
records are classified as non-links.
The "RecLinkData"
method is a shortcut for emClassify
.
The "RLBigData"
method checks if weights are
present in the underlying database. If this is the case, classification
is based on the existing weights. If not, weights are calculated on the fly
during classification, but not stored. The latter behaviour might be preferable
when a very large dataset is to be classified and disk space is limited.
A progress bar is displayed only if
weights are calculated on the fly and, by default, unless output is diverted by
sink
(e.g. in a Sweave script).
For a general introduction to weight based record linkage, see the vignette
"Weight-based deduplication".epiWeights
# generate record pairs
data(RLdata500)
rpairs <- compare.dedup(RLdata500, blockfld=list(1,3,5,6,7), identity=identity.RLdata500)
# calculate weights
rpairs <- fsWeights(rpairs)
# classify and show results
summary(fsClassify(rpairs,0))
Run the code above in your browser using DataLab