vwr (version 0.3.0)

ald: Compute average Levenshtein distances

Description

Compute the average Levenshtein distances between a word and its n nearest neighbors in a lexicon.

Usage

ald(sources, targets, n, method="levenshtein", parallel = FALSE) old20(sources, targets, method="levenshtein", parallel = FALSE)

Arguments

sources
a list of words for which the average Levenshtein distance should be computed. Must be of type character, or convertible to type character with as.character.
targets
a list of words containing possible neighbors. Must be of type character, or convertible to type character with as.character.
method
specifies the distance function. With "levenshtein", levenshtein.distance is used, with "levenshtein.damerau" levenshtein.damerau is used.
n
specifies the number of nearest neighbors on which the average should be based. The variant old20 does not take the n argument (it is fixed to 20).
parallel
with parallel=TRUE, ald will run in parallel an multiple cores. The number of parallel processes is specified by detectCores(logical = FALSE).

Value

A vector of average Levenshtein distances with names corresponding to sources.

Details

The OLD20 measure was originally proposed by Yarkoni et al. (2008). This implementation is orders of magnitude faster than Tal Yarkoni's LDcalc program (see http://talyarkoni.com/materials.php). Do not use multicore=TRUE in a GUI environment, as it will most likely crash your R session.

References

Yarkoni, T., Balota, D., & Yap, M. (2008). Moving beyond Coltheart’s N: A new measure of orthographic similarity. Psychonomic Bulletin & Review, 15(5), 971–979.

See Also

levenshtein.distance, levenshtein.neighbors

Examples

Run this code
data(basque.words)
ald(basque.words[1:10],basque.words,20)
old20(basque.words[1:10],basque.words)

Run the code above in your browser using DataLab