Last chance! 50% off unlimited learning
Sale ends in
Calculate EM-estimates of m- and u-probabilities
problink_em(
patterns,
mprobs0 = list(0.95),
uprobs0 = list(0.02),
p0 = 0.05,
tol = 1e-05
)
either a table of patterns (as output by
tabulate_patterns
) or pairs with comparison columns (as
output by compare_pairs
).
initial values of the m- and u-probabilities. These
should be lists with numeric values. The names of the elements in the list
should correspond to the names in by_x
in compare_pairs
.
the initial estimate of the probability that a pair is a match.
when the change in the m and u-probabilities is smaller than tol
the algorithm is stopped.
Returns an object of type problink_em
. This is a list containing the
estimated mprobs
, uprobs
and overall linkage probability
p
. It also contains the table of comparison patterns
.
Fellegi, I. and A. Sunter (1969). "A Theory for Record Linkage", Journal of the American Statistical Association. 64 (328): pp. 1183-1210. doi:10.2307/2286061.
Herzog, T.N., F.J. Scheuren and W.E. Winkler (2007). Data Quality and Record Linkage Techniques, Springer.
# NOT RUN {
data("linkexample1", "linkexample2")
pairs <- pair_blocking(linkexample1, linkexample2, "postcode")
pairs <- compare_pairs(pairs, c("lastname", "firstname", "address", "sex"))
model <- problink_em(pairs)
summary(model)
# }
# NOT RUN {
# }
Run the code above in your browser using DataLab