Description
Field comparisons for string variables. Three possible agreement patterns are considered:
0 total disagreement, 1 partial agreement, 2 agreement.
The distance between strings is calculated using a Jaro-Winkler distance.
Usage
gammaCKpar(matAp, matBp, n.cores, cut.a, cut.p, method, w)
Arguments
matAp
vector storing the comparison field in data set 1
matBp
vector storing the comparison field in data set 2
n.cores
Number of cores to parallelize over. Default is NULL.
cut.a
Lower bound for full match, ranging between 0 and 1. Default is 0.92
cut.p
Lower bound for partial match, ranging between 0 and 1. Default is 0.88
method
String distance method, options are: "jw" Jaro-Winkler (Default), "jaro" Jaro, and "lv" Edit
w
Parameter that describes the importance of the first characters of a string (only needed if method = "jw"). Default is .10
Value
gammaCKpar
returns a list with the indices corresponding to each
matching pattern, which can be fed directly into tableCounts
and matchesLink
.
Examples
Run this code# NOT RUN {
g1 <- gammaCKpar(dfA$firstname, dfB$lastname)
# }
# NOT RUN {
# }
Run the code above in your browser using DataLab