A typical use is to match strings that are not precisely the same. For example
amatch(c("hello","g'day"),c("hi","hallo","ola"),maxDist=2)
returns c(2,NA) since "hello" matches closest with
"hallo", and within the maximum (optimal string alignment) distance.
The second element, "g'day", matches closest with "ola" but
since the distance equals 4, no match is reported.
A second typical use is to compute string distances. For example
stringdist(c("g'day"),c("hi","hallo","ola"))
Returns c(5,5,4) since these are the distances between "g'day"
and respectively "hi", "hallo", and "ola".
A third typical use would be to compute a dist object. The command
stringdistmatrix(c("foo","bar","boo","baz"))
returns an object of class dist that can be used by clustering
algorithms such as stats::hclust.
A fourth use is to compute string distances between general sequences,
represented as integer vectors (which must be stored in a list):
seq_dist( list(c(1L,1L,2L)), list(c(1L,2L,1L),c(2L,3L,1L,2L)) )
The above code yields the vector c(1,2) (the first shorter first
argument is recycled over the longer second argument)
Besides documentation for each function, the main topics documented are:
stringdist-metrics-- string metrics supported by the packagestringdist-encoding -- how encoding is handled by the packagestringdist-parallelization -- on multithreadingstringdistpackage for approximate string matching.
R Journal 6(1) pp 111-122citation('stringdist')