stringdist-package: A package for string distance calculation and approximate string matching.
Description
A package for string distance calculation and approximate string matching.Introduction
The stringdist package offers fast and platform-independent string metrics.
It's main purpose is to compute various string distances and to do approximate text matching between character vectors.
Besides documentation for each function, the main topics documented are:
stringdist-encoding -- how encoding is handled by the package
stringdist-parallelization -- on multithreadingAcknowledgements
- The code for the full Damerau-Levenshtein distance was adapted from Nick Logan'shttps://github.com/ugexe/Text--Levenshtein--Damerau--XS/blob/master/damerau-int.c{public github repository}.
- C code for converting UTF-8 to integer was copied from the R core for performance reasons.
- The code for soundex conversion was kindly contributed by Jan van der Laan.
Citation
If you would like to cite this package, please cite the http://journal.r-project.org/archive/2014-1/loo.pdf{R Journal Paper}:
- M.P.J. van der Loo (2014). The
stringdistpackage for approximate string matching.
R Journal 6(1) pp 111-122
code
citation('stringdist')