Learn R Programming

stringdist (version 0.8.2)

phonetic: Phonetic algorithms

Description

Translate strings to phonetic codes. Similar sounding strings should get similar or equal codes.

Usage

phonetic(x, method = c("soundex"), useBytes = FALSE)

Arguments

x
a character vector whose elements are phonetically encoded.
method
name of the algorithm used. The default is "soundex".
useBytes
Perform byte-wise comparison. useBytes=TRUE is faster but may yield different results depending on character encoding. For more information see the documentation of stringdist.

Value

  • The returns value depends on the method used. However, all currently implemented methods return a character vector of the same length of the input vector. Output characters are in the system's native encoding.

Citation

If you would like to cite this package, please cite the R-journal paper:
  • M.P.J. van der Loo (2014). Thestringdistpackage for approximate string matching. R Journal 6(1) pp. 111-122

code

citation('stringdist')

Details

Currently, only the soundex algorithm is implemented. Note that soundex coding is only meaningful for characters in the ranges a-z and A-Z. Soundex coding of strings containing non-printable ascii or non-ascii characters may be system-dependent and should not be trusted. If non-ascii or non-printable ascii charcters are encountered, a warning is emitted.

References

  • The Soudex algorithm implemented is the algorithm used by thehttp://www.archives.gov/research/census/soundex.html{National Archives}. This algorithm differs slightly from the original algorithm patented by R.C. Russell (US patents 1261167 (1918) and 1435663 (1922)).

See Also

printable_ascii

Examples

Run this code
# The following examples are from The Art of Computer Programming (part III, p. 395)
# (Note that our algorithm is specified different from the one in TACP, see references.)
phonetic(c('Euler','Gauss','Hilbert','Knuth','Lloyd','Lukasiewicz','Wachs'),method='soundex')

Run the code above in your browser using DataLab