AlignTranslation: Align Sequences By Their Amino Acid Translation

Description

Performs alignment of a set of DNA or RNA sequences by aligning their corresponding amino acid sequences.

Usage

AlignTranslation(myXStringSet, sense = "+", direction = "5' to 3'", readingFrame = NA, asAAStringSet = FALSE, ...)

Arguments

myXStringSet

A DNAStringSet or RNAStringSet object of unaligned sequences.

sense

Single character specifying sense of the input sequences, either the positive ("+") coding strand or negative ("-") non-coding strand.

direction

Direction of the input sequences, either "5' to 3'" or "3' to 5'".

readingFrame

Numeric vector giving a single reading frame for all of the sequences, or an individual reading frame for each sequence in myXStringSet. The readingFrame can be either 1, 2, 3 to begin translating on the first, second, and third nucleotide position, or NA (the default) to guess the reading frame. (See details section below.)

asAAStringSet

Logical specifying whether to return the aligned translation as an AAStringSet rather than an XStringSet of the input type.

...

Further arguments to be passed directly to AlignSeqs, including perfectMatch, misMatch, gapOpening, gapExtension, terminalGap, restrict, anchor, substitutionMatrix, doNotAlign, guideTree, processors, and verbose.

Value

An XStringSet matching the input type.

Details

Alignment of proteins is often more accurate than alignment of their coding nucleic acid sequences. This function aligns the input nucleic acid sequences via aligning their translated amino acid sequences. First, the input sequences are translated according to the specified sense, direction, and readingFrame. The resulting amino acid sequences are aligned using AlignSeqs, and this alignment is reverse translated into the original sequence type, sense, and direction.

If the readingFrame is NA (the default) then an attempt is made to guess the reading frame of each sequence based on the number of stop codons in the translated amino acids. For each sequence, the first reading frame will be chosen (either 1, 2, or 3) without stop codons, except in the last position. If the number of stop codons is inconclusive for a sequence then the reading frame will default to 1. The entire length of each sequence is translated in spite of any stop codons identified. Note that this method is only constructive in circumstances where there is a substantially long coding sequence with at most a single stop codon expected in the final position, and therefore it is preferable to specify the reading frame of each sequence if it is known.

Examples

Run this code

# First three sequences translate to MFITP*; the last sequence is MF-TP*
rna <- RNAStringSet(c("AUGUUCAUCACCCCCUAA", "AUGUUCAUAACUCCUUGA",
	"AUGUUCAUUACACCGUAG", "AUGUUUACCCCAUAA"))
RNA <- AlignSeqs(rna)
BrowseSequences(RNA) # incorrect gap position

RNA <- AlignTranslation(rna)
BrowseSequences(RNA) # correct gap position

Run the code above in your browser using DataLab