Learn R Programming

fastLink (version 0.1.1)

dedupeMatches: dedupeMatches

Description

Dedupe matched dataframes.

Usage

dedupeMatches(matchesA, matchesB, EM,
matchesLink, varnames, stringdist.match, partial.match,
linprog, stringdist.method, cut.a = 0.92, cut.p = 0.88,
jw.weight)

Arguments

matchesA

A dataframe of the matched observations in dataset A, with all variables used to inform the match.

matchesB

A dataframe of the matched observations in dataset B, with all variables used to inform the match.

EM

The EM object from emlinkMARmov()

matchesLink

The output from matchesLink()

varnames

A vector of variable names to use for matching. Must be present in both matchesA and matchesB.

stringdist.match

A vector of booleans, indicating whether to use string distance matching when determining matching patterns on each variable. Must be same length as varnames.

partial.match

A vector of booleans, indicating whether to include a partial matching category for the string distances. Must be same length as varnames. Default is FALSE for all variables.

linprog

Whether to implement Winkler's linear programming solution to the deduplication problem. Default is false.

stringdist.method

String distance method for calculating similarity, options are: "jw" Jaro-Winkler (Default), "jaro" Jaro, and "lv" Edit

cut.a

Lower bound for full string-distance match, ranging between 0 and 1. Default is 0.92

cut.p

Lower bound for partial string-distance match, ranging between 0 and 1. Default is 0.88

jw.weight

Parameter that describes the importance of the first characters of a string (only needed if stringdist.method = "jw"). Default is .10

Value

dedupeMatches() returns a list containing the following elements:

matchesA

A deduped version of matchesA

matchesB

A deduped version of matchesB

EM

A deduped version of the EM object