Preprocessing (cleaning) of strings prior to linkage.
StandardizeString(strings)
A character vector of strings to be standardized.
Returns a character vector with standardized strings.
Strings are capitalized, letters are substituted as described below. Leading and trailing blanks are removed. Other non-ASCII characters are deleted.
Replace "<U+00C6>" with "AE"
Replace "<U+00E6>" with "AE"
Replace "<U+00C4>" with "AE"
Replace "<U+00E4>" with "AE"
Replace "<U+00C5>" with "A"
Replace "<U+00E5>" with "A"
Replace "<U+00C2>" with "A"
Replace "<U+00E2>" with "A"
Replace "<U+00C0>" with "A"
Replace "<U+00E0>" with "A"
Replace "<U+00C1>" with "A"
Replace "<U+00E1>" with "A"
Replace "<U+00C7>" with "C"
Replace "<U+00C7>" with "C"
Replace "<U+00CA>" with "E"
Replace "<U+00EA>" with "E"
Replace "<U+00C8>" with "E"
Replace "<U+00E8>" with "E"
Replace "<U+00C9>" with "E"
Replace "<U+00E9>" with "E"
Replace "<U+00CF>" with "I"
Replace "<U+00EF>" with "I"
Replace "<U+00CE>" with "I"
Replace "<U+00EE>" with "I"
Replace "<U+00CC>" with "I"
Replace "<U+00EC>" with "I"
Replace "<U+00CD>" with "I"
Replace "<U+00ED>" with "I"
Replace "<U+00D6>" with "OE"
Replace "<U+00F6>" with "OE"
Replace "<U+00D8>" with "O"
Replace "<U+00F8>" with "O"
Replace "<U+00D4>" with "O"
Replace "<U+00F4>" with "O"
Replace "<U+00D2>" with "O"
Replace "<U+00F2>" with "O"
Replace "<U+00D3>" with "O"
Replace "<U+00F3>" with "O"
Replace "<U+00DF>" with "SS"
Replace "<U+015E>" with "S"
Replace "<U+015F>" with "S"
Replace "<U+00FC>" with "UE"
Replace "<U+00DC>" with "UE"
Replace "<U+016E>" with "U"
Replace "<U+00DB>" with "U"
Replace "<U+00FB>" with "U"
Replace "<U+00D9>" with "U"
Replace "<U+00F9>" with "U"
# NOT RUN {
strings = c("P<U+00E4>ter", " J<U+00FC>rgen", " Ro<U+00DF>")
StandardizeString(strings)
# }
Run the code above in your browser using DataLab