Learn R Programming

Ecfun (version 0.1-3)

matchName: Match surname and givenName in a table

Description

Use parseName to split a name into surname and givenName, the look for matches in table.

Usage

matchName(x, data, Names=1:2, 
          nicknames=matrix(character(0), 0, 2), ...)
matchName1(x1, data, name=1,         
          nicknames=matrix(character(0), 0, 2), ...)

Arguments

x
One of the following:
  • A character matrix ordata.framewith the same number of rows asdata. The best partial match is sought inNames. The algorithm stops when a uniqu
data
a character matrix or a data.frame. If surname and givenName are character vectors of names, their length musth match the number of rows of data.
Names
One of the following in which matches for x will be sought:
  • A character matrix ordata.framewith the same number of rows asdata.
  • A character vector whose length matches the num
nicknames
a character matrix with two columns, each row giving a pair of names like "Pete" and "Peter" that should be regarded as equivalent if no exact match(es) is(are) found.
...
optional arguments passed to subNonStandardNames
x1
a character vector of names to match name
name
Either a character vector of names whose length matches nrow(data) or something identifying a column of data to use in matching x1 NOTE: If name is a character vector of names,

Value

  • matchName returns a list of the same length as x, each of whose components is object obtained as a subset of rows of data or NULL if no acceptable matches are found. matchNames1 returns a list of vectors of integers for subsets of data matching x1.

Details

*** 1. matchName(x, data, Names, nicknames, ...): 1.1. if(length(dim(x)<2))x <-="" parsename(x,="" ...)="" 1.2.="" x1="" matchname1(x[,="" 1],="" cata,="" names[1],="" 1.3.="" for="" any="" component="" i="" of="" with="" multiple rows,="" let="" x1i="" matchname1(x[i,="" 2],="" x1[[i]],="" name[-1],="" nicknames="nicknames," ...).="" if="" nrow(x1i)="">0, x1[[i]] <- x1i; else leave unchanged. 1.4. return x1 =========== *** 2. matchName1(x1, data, name, nicknames, ...): 2.1. If name indicates a column of data, replace with data[, name]. 2.2. xsplit <- strsplit(x1, ' ') 2.3. nx <- length(x1); xlist <- vector(nx, mode='list') 2.4. for(j in 1:nx): 2.5. xj <- xplit[[j]] 2.6. let jd = the subset of names that match xj or subNonStandardNames(xj) or nicknames of xj; xlist[j] <- jd. 2.7. return xlist

See Also

parseName subNonStandardNames

Examples

Run this code
##
## 1.  Names to match exercising many possibile combinations 
##     of surname with 0, 1, >1 matches possibly after 
##     replacing with subNonStandardNames 
##     combined with possibly multiple givenName combinations 
##     with 0, 1, >1 matches possibly requiring replacing with 
##     subNonStandardNames or nicknames 
##
# NOTE:  "-" could also be "e" with an accent;  
#    not included with this documentation, because 
#    non-English characters generate warnings in standard tests.  
Names2mtch <- c("Andr_ Bruce C_rdenas", "Dolores Ella Feinstein",
           "George Homer", "Inez Jane Kappa", "Luke Michael Noel", 
           "Oscar Papa", "Quincy Ra_l Stevens", 
           "Thomas U. Vel_zquez", "William X. Young", 
           "Zebra")
##
## 2.  Data = matrix(..., byrow=TRUE) to exercise the combinations 
##     the combinations from 1 
##
Data1 <- matrix(c("C_rdenas", "Andre B.", "123", 
                  "C_rdenas", "Don", "456", 
                  "Feld", "Don", "789", "Young", "Bill", "369"), 
                4, byrow=TRUE)
##
## 3.  matchName1
##        
parceNm1 <- parseName(Names2mtch)
match1.1 <- matchName1(parceNm1[, 'surname'], Data1)

# check
match1.1s <- vector(10, mode='list')
match1.1s[[1]] <- 1:2
match1.1s[[9]] <- 4
names(match1.1s) <- parceNm1[, 'surname'] 
stopifnot(
all.equal(match1.1, match1.1s)
)

##
## 4.  matchName
##
nickNames <- matrix(c("William", "Bill"), 1, byrow=TRUE)

match1 <- matchName(Names2mtch, Data1, nicknames=nickNames)
                  
# check 
match1a <- list("Cardenas, Andre Bruce"=Data1[1, ], 
                "Feinstein, Dolores Ella"=NULL, 
                "Homer, George"=NULL, "Kappa, Inez Jane"=NULL, 
                "Noel, Luke Michael"=NULL, "Papa, Oscar"=NULL, 
                "Stevens, Quincy Raul"=NULL, 
                "Velazquez, Thomas U."=NULL, 
                "Young, William X."=Data1[4,], ", Zebra"=NULL)
stopifnot(
all.equal(match1, match1a)
)

Run the code above in your browser using DataLab