Learn R Programming

PPRL (version 0.3.5)

StandardizeString: Standardize String

Description

Preprocessing (cleaning) of strings prior to linkage.

Usage

StandardizeString(strings)

Arguments

strings

A character vector of strings to be standardized.

Value

Returns a character vector with standardized strings.

Details

Strings are capitalized, letters are substituted as described below. Leading and trailing blanks are removed. Other non-ASCII characters are deleted.

  • Replace "<U+00C6>" with "AE"

  • Replace "<U+00E6>" with "AE"

  • Replace "<U+00C4>" with "AE"

  • Replace "<U+00E4>" with "AE"

  • Replace "<U+00C5>" with "A"

  • Replace "<U+00E5>" with "A"

  • Replace "<U+00C2>" with "A"

  • Replace "<U+00E2>" with "A"

  • Replace "<U+00C0>" with "A"

  • Replace "<U+00E0>" with "A"

  • Replace "<U+00C1>" with "A"

  • Replace "<U+00E1>" with "A"

  • Replace "<U+00C7>" with "C"

  • Replace "<U+00C7>" with "C"

  • Replace "<U+00CA>" with "E"

  • Replace "<U+00EA>" with "E"

  • Replace "<U+00C8>" with "E"

  • Replace "<U+00E8>" with "E"

  • Replace "<U+00C9>" with "E"

  • Replace "<U+00E9>" with "E"

  • Replace "<U+00CF>" with "I"

  • Replace "<U+00EF>" with "I"

  • Replace "<U+00CE>" with "I"

  • Replace "<U+00EE>" with "I"

  • Replace "<U+00CC>" with "I"

  • Replace "<U+00EC>" with "I"

  • Replace "<U+00CD>" with "I"

  • Replace "<U+00ED>" with "I"

  • Replace "<U+00D6>" with "OE"

  • Replace "<U+00F6>" with "OE"

  • Replace "<U+00D8>" with "O"

  • Replace "<U+00F8>" with "O"

  • Replace "<U+00D4>" with "O"

  • Replace "<U+00F4>" with "O"

  • Replace "<U+00D2>" with "O"

  • Replace "<U+00F2>" with "O"

  • Replace "<U+00D3>" with "O"

  • Replace "<U+00F3>" with "O"

  • Replace "<U+00DF>" with "SS"

  • Replace "<U+015E>" with "S"

  • Replace "<U+015F>" with "S"

  • Replace "<U+00FC>" with "UE"

  • Replace "<U+00DC>" with "UE"

  • Replace "<U+016E>" with "U"

  • Replace "<U+00DB>" with "U"

  • Replace "<U+00FB>" with "U"

  • Replace "<U+00D9>" with "U"

  • Replace "<U+00F9>" with "U"

See Also

PPRL

Examples

Run this code
# NOT RUN {
strings = c("P<U+00E4>ter", " J<U+00FC>rgen", " Ro<U+00DF>")
StandardizeString(strings)

# }

Run the code above in your browser using DataLab