Learn R Programming

stringdist (version 0.9.1)

stringsim: Compute similarity scores between strings

Description

stringsim computes pairwise string similarities between elements of character vectors a and b, where the vector with less elements is recycled.

Usage

stringsim(a, b, method = c("osa", "lv", "dl", "hamming", "lcs", "qgram",
  "cosine", "jaccard", "jw", "soundex"), q = 1, ...)

Arguments

a
R object (target); will be converted by as.character.
b
R object (source); will be converted by as.character.
method
Method for distance calculation. The default is "osa", see stringdist-metrics.
q
Size of the $q$-gram; must be nonnegative. Only applies to method='qgram', 'jaccard' or 'cosine'.
...
additional arguments are passed on to stringdist.

Value

  • Returns a vector with similarities, which are values between 0 and 1 where 1 corresponds to perfect similarity (distance 0) and 0 to complete dissimilarity. NA is returned when stringdist returns NA. Distances equal to Inf are truncated to a similarity of 0.

Details

The similarity is calculated by first calculating the distance using stringdist and then dividing the distance by the maximum possible distance. This results in a score between 0 and 1, with 1 corresponding to perfect similarity and 0 to complete dissimilarity.

Examples

Run this code
# Calculate the similarity using the default method of optimal string alignment
stringsim("ca", "abc")

# Calculate the similarity using the Jaro-Winkler method
# The p argument is passed on to stringdist
stringsim('MARTHA','MATHRA',method='jw', p=0.1)

Run the code above in your browser using DataLab