RecordLinkage (version 0.4-11)

phonetics: Phonetic Code

Description

Interface to phonetic coding functions.

Usage

pho_h(str)
soundex(str)

Arguments

str

A character vector or matrix. Factors are converted to character.

Details

Translates its argument to a phonetic code. pho_h by J<U+001B29FD>Joerg Michael (see references) is intended for German language and normalizes umlauts and accent characters. soundex is a widespread algorithm for English names. This implementation can only handle common characters. Both algorithms strip off non-alphabetical characters, with the exception that numbers are left unchanged by pho_h.

The C code for soundex was taken from PostgreSQL 8.3.6.

A character vector or matrix with the same size and dimensions as str, containing its phonetic encoding.

J<U+001B29FD>Joerg Michael, Doppelg<e4>nger gesucht -- Ein Programm f<U+32F7B9B5>er in: c't 1999, No. 25, pp. 252--261. The Source code is published (under GPL) at http://www.heise.de/ct/ftp/99/25/252/. Andreas Borg (R interface only)

jarowinkler and levenshteinSim for string comparison.

misc