# stringdist v0.9.6

Monthly downloads

## Approximate String Matching, Fuzzy Text Search, and String Distance Functions

Implements an approximate string matching version of R's native
'match' function. Also offers fuzzy text search based on various string
distance measures. Can calculate various string distances based on edits
(Damerau-Levenshtein, Hamming, Levenshtein, optimal sting alignment), qgrams (q-
gram, cosine, jaccard distance) or heuristic metrics (Jaro, Jaro-Winkler). An
implementation of soundex is provided as well. Distances can be computed between
character vectors while taking proper care of encoding or between integer
vectors representing generic sequences. This package is built for speed and
runs in parallel by using 'openMP'. An API for C or C++ is exposed as well.

## Readme

## stringdist

- Approximate matching and string distance calculations for R.
- All distance and matching operations are system- and encoding-independent.
- Built for speed, using openMP for parallel computing.

The package offers the following main functions:

`stringdist`

computes pairwise distances between two input character vectors (shorter one is recycled)`stringdistmatrix`

computes the distance matrix for one or two vectors`stringsim`

computes a string similarity between 0 and 1, based on`stringdist`

`amatch`

is a fuzzy matching equivalent of R's native`match`

function`ain`

is a fuzzy matching equivalent of R's native`%in%`

operator`seq_dist`

,`seq_distmatrix`

,`seq_amatch`

and`seq_ain`

for distances between, and matching of integer sequences.

These functions are built upon `C`

-code that re-implements some common (weighted) string
distance functions. Distance functions include:

- Hamming distance;
- Levenshtein distance (weighted)
- Restricted Damerau-Levenshtein distance (weighted, a.k.a. Optimal String Alignment)
- Full Damerau-Levenshtein distance
- Longest Common Substring distance
- Q-gram distance
- cosine distance for q-gram count vectors (= 1-cosine similarity)
- Jaccard distance for q-gram count vectors (= 1-Jaccard similarity)
- Jaro, and Jaro-Winkler distance
- Soundex-based string distance

Also, there are some utility functions:

`qgrams()`

tabulates the qgrams in one or more`character`

vectors.`seq_qrams()`

tabulates the qgrams (somtimes called ngrams) in one or more`integer`

vectors.`phonetic()`

computes phonetic codes of strings (currently only soundex)`printable_ascii()`

is a utility function that detects non-printable ascii or non-ascii characters.

#### C API

Some of `stringdist`

's underlying `C`

functions can be called directly from
`C`

code in other packages. The description of the API can be found by either
typing `?stringdist_api`

in the R console or open the vignette directly as follows:

```
vignette("stringdist_C-Cpp_api", package="stringdist")
```

Examples of packages that link to `stringdist`

can be found
here and
here.

#### Resources

## Functions in stringdist

Name | Description | |

phonetic | Phonetic algorithms | |

stringdist_api | Calling stringdist from C or C++ | |

printable_ascii | Detect the presence of non-printable or non-ascii characters | |

seq_sim | Compute similarity scores between sequences of integers | |

seq_dist | Compute distance metrics between integer sequences | |

seq_amatch | Approximate matching for integer sequences. | |

afind | Stringdist-based fuzzy text search | |

qgrams | Get a table of qgram counts from one or more character vectors. | |

stringdist | Compute distance metrics between strings | |

stringdist-encoding | String metrics in stringdist | |

stringdist-package | A package for string distance calculation and approximate string matching. | |

stringdist-metrics | String metrics in stringdist | |

seq_qgrams | Get a table of qgram counts for integer sequences | |

stringdist-parallelization | Multithreading and parallelization in stringdist | |

stringsim | Compute similarity scores between strings | |

amatch | Approximate string matching | |

No Results! |

## Vignettes of stringdist

Name | ||

RJournal_6_111-122-2014.Rnw | ||

loo2014stringdist.pdf | ||

stringdist_C-Cpp_api.Rnw | ||

stringdist_api.pdf | ||

No Results! |

## Last month downloads

## Details

License | GPL-3 |

LazyData | no |

Type | Package |

LazyLoad | yes |

URL | https://github.com/markvanderloo/stringdist |

BugReports | https://github.com/markvanderloo/stringdist/issues |

Encoding | UTF-8 |

RoxygenNote | 7.1.0 |

NeedsCompilation | yes |

Packaged | 2020-07-16 13:32:28 UTC; mark |

Repository | CRAN |

Date/Publication | 2020-07-16 14:00:02 UTC |

imports | parallel |

depends | R (>= 2.15.3) |

suggests | tinytest |

Contributors | Nick Logan, R Core team, Jan van der Laan, Chris Muir, Johannes Gruber |

#### Include our badge in your README

```
[![Rdoc](http://www.rdocumentation.org/badges/version/stringdist)](http://www.rdocumentation.org/packages/stringdist)
```