srdistance: Edit distances between reads and a small number of short references

Description

srdistance calculates the edit distance from each read in pattern to each read in subject. The underlying algorithm pairwiseAlignment is only efficient when both reads are short, and when the number of subject reads is small.

Usage

srdistance(pattern, subject, ...)

Arguments

pattern

An object of class DNAStringSet containing reads whose edit distance is desired.

subject

A short character vector, DNAString or (small) DNAStringSet to serve as reference.

...

additional arguments, unused.

Value

A list of length equal to that of subject. Each element is a numeric vector equal to the length of pattern, with values corresponding to the minimum distance between between the corresponding pattern and subject sequences.

Details

The underlying algorithm performs pairwise alignment from each read in pattern to each sequence in subject. The return value is a list of numeric vectors of distances, one list element for each sequence in subject. The vector in each list element contains for each read in pattern the edit distance from the read to the corresponding subject. The weight matrix and gap penalties used to calculate the distance are structured to weight base substitutions and single base insert/deletions equally. Edit distance between known and ambiguous (e.g., N) nucleotides, or between ambiguous nucleotides, are weighted as though each possible nucleotide in the ambiguity were equally likely.

Examples

Run this code

sp <- SolexaPath(system.file("extdata", package="ShortRead"))
aln <- readAligned(sp, "s_2_export.txt")
polyA <- polyn("A", 35)
polyT <- polyn("T", 35)

d1 <- srdistance(clean(sread(aln)), polyA)
d2 <- srdistance(sread(aln), polyA)
d3 <- srdistance(sread(aln), c(polyA, polyT))