tcR (version 1.1)

find.similar.sequences: Find similar sequences.

Description

Return matrix M with two columns. For each element in row i and column j M[i,j] => distance between pattern(i) and data(j) sequences equal to or less than .max.errors. This function will uppercase .data and remove all strings, which have anything than A-Z letters.

Usage

find.similar.sequences(.data, .patterns = c(), .method = c('exact', 'hamm', 'lev'),
                       .max.errors = 1, .verbose = T, .clear = F)

exact.match(.data, .patterns = c(), .verbose = T)

hamming.match(.data, .patterns = c(), .max.errors = 1, .verbose = T)

levenshtein.match(.data, .patterns = c(), .max.errors = 1, .verbose = T)

Arguments

.data
Vector of strings.
.patterns
Character vector of sequences, which will be used for searching for neighbours.
.method
Which method use: 'exact' for exact matching, 'hamm' for Hamming Distance, 'lev' for Levenshtein distance.
.max.errors
Max Hamming or Levenshtein distance between strings. Doesn't use in 'exact' setting.
.verbose
Should function print progress or not. // DON'T USE IT
.clear
If T than remove all sequences with character "*" or "~".

Value

  • Matrix with two columns [i,j], dist(data(i), data(j)) <= .max.errors.<="" li="">