Learn R Programming

fastLink (version 0.1.1)

emlinkMARmov: emlinkMARmov

Description

Expectation-Maximization algorithm for Record Linkage under the Missing at Random (MAR) assumption.

Usage

emlinkMARmov(patterns, nobs.a, nobs.b, p.m, iter.max,
tol, p.gamma.k.m, p.gamma.k.u, prior.lambda, w.lambda,
prior.pi, w.pi, address.field, gender.field)

Arguments

patterns

table that holds the counts for each unique agreement pattern. This object is produced by the function: tableCounts.

nobs.a

Number of observations in dataset A

nobs.b

Number of observations in dataset B

p.m

probability of finding a match. Default is 0.1

iter.max

Max number of iterations. Default is 5000

tol

Convergence tolerance. Default is 1e-05

p.gamma.k.m

probability that conditional of being in the matched set we observed a specific agreement value for field k.

p.gamma.k.u

probability that conditional of being in the non-matched set we observed a specific agreement value for field k.

prior.lambda

The prior probability of finding a match, derived from auxiliary data.

w.lambda

How much weight to give the prior on lambda versus the data. Must range between 0 (no weight on prior) and 1 (weight fully on prior)

prior.pi

The prior probability of the address field not matching, conditional on being in the matched set. To be used when the share of movers in the population is known with some certainty.

w.pi

How much weight to give the prior on pi versus the data. Must range between 0 (no weight on prior) and 1 (weight fully on prior)

address.field

Boolean indicators for whether a given field is an address field. Default is NULL (FALSE for all fields). Address fields should be set to TRUE while non-address fields are set to FALSE if provided.

gender.field

Boolean indicators for whether a given field is for gender. If so, exact match is conducted on gender. Default is NULL (FALSE for all fields). The one gender field should be set to TRUE while all other fields are set to FALSE if provided.

Value

emlinkMARmov returns a list with the following components:

zeta.j

The posterior match probabilities for each unique pattern.

p.m

The posterior probability of a pair matching.

p.u

The posterior probability of a pair not matching.

p.gamma.k.m

The posterior of the matching probability for a specific matching field.

p.gamma.k.u

The posterior of the non-matching probability for a specific matching field.

p.gamma.j.m

The posterior probability that a pair is in the matched set given a particular agreement pattern.

p.gamma.j.u

The posterior probability that a pair is in the unmatched set given a particular agreement pattern.

patterns.w

Counts of the agreement patterns observed, along with the Felligi-Sunter Weights.

iter.converge

The number of iterations it took the EM algorithm to converge.

nobs.a

The number of observations in dataset A.

nobs.b

The number of observations in dataset B.

Examples

Run this code
# NOT RUN {
## Calculate gammas
g1 <- gammaCKpar(dfA$firstname, dfB$firstname)
g2 <- gammaCKpar(dfA$middlename, dfB$middlename)
g3 <- gammaCKpar(dfA$lastname, dfB$lastname)
g4 <- gammaKpar(dfA$birthyear, dfB$birthyear)

## Run tableCounts
tc <- tableCounts(list(g1, g2, g3, g4), nobs.a = nrow(dfA), nobs.b = nrow(dfB))

## Run EM
em <- emlinkMAR(tc, nobs.a = nrow(dfA), nobs.b = nrow(dfB))
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab