NameNeedle (version 1.2.6)

needles: Needleman-Wunsch global alignment algorithm

Description

This package contains a simple implementation of the Needleman-Wunsch global alignment algorithm.

Usage

needles(pattern, subject, params=defaultNeedleParams)
needleScores(pattern, subjects, params=defaultNeedleParams)
defaultNeedleParams

Value

The needles function returns a list with five components:

score

The raw alignment score.

align1

The final (optimal) alignment for the pattern.

align2

The final (optimal) alignment for the subject.

sm

The score matrix.

dm

The backtrace matrix.

The needleScores function returns a numeric vector the same length as the subjects argument, with each entry equal to the corresponding raw alignment score.

Arguments

pattern

character string to be matched

subject

character string to be matched against

subjects

character vector where matches are sought

params

list containing four required components. The default values are specified by the object defaultNeedleParams, which contains the following values:

   
    $ MATCH   : num 1
    $ MISMATCH: num -1
    $ GAP     : num -1
    $ GAPCHAR : chr "*"

Author

Kevin R. Coombes krc@silicovore.com, P. Roebuck proebuck@mdanderson.org

Details

The Needleman-Wunsch global alignment algorithm was one of the first algorithms used to align DNA, RNA, or protein sequences. The basic algorithm uses dynamic programming to find an optimal alignment between two sequences, with parameters that specify penalties for mismatches and gaps and a reward for exact matches. More elaborate algorithms (not implemented here) make use of matrices with different penalties depending on different kinds of mismatches. The version implemented here is based on the Perl implementation in the first section of Chapter 3 of the book BLAST.

References

Needleman SB, Wunsch CD.
A general method applicable to the search for similarities in the amino acid sequence of two proteins.
J Mol Biol 1970, 48(3):443--453.

Korf I, Yandell M, Bedell J.
BLAST.
O'Reilly Media, 2003.

Wang J, Byers LA, Yordy JS, Liu W, Shen L, Baggerly KA, Giri U, Myers JN, Ang KK, Story MD, Girard L, Minna JD, Heymach JV, Coombes KR.
Blasted cell line names.
Cancer Inform. 2010; 9:251--5.

See Also

The Biostrings package from Bioconductor used to contain a function called needwunQS that provided a simple gap implementation of Needleman-Wunsch, similar to the one presented here. That function has been deprecated in favor of a more elaborate interface called pairwiseAlignment that incorporates a variety of other alignment methods in addition. While pairwiseAlignment is much more useful for applications to biological sequences, it is serious overkill for the application we have in mind for matching cell line or other sample names.

Examples

Run this code
data(cellLineNames)
myParam <- defaultNeedleParams
myParam$MATCH <- 2
myParam$MISMATCH <- -2
needles(sf2Names[2], illuNames[1], myParam)
scores <- needleScores(sf2Names[6], illuNames, myParam)
w <- which(scores == max(scores))
w
sf2Names[6]

needles(sf2Names[6], illuNames[w], myParam)

Run the code above in your browser using DataCamp Workspace