AdjustAlignment: Improve An Existing Alignment By Adjusting Gap Placements

Description

Makes small adjustments by shifting groups of gaps left and right to find their optimal positioning in a multiple sequence alignment.

Usage

AdjustAlignment(myXStringSet, perfectMatch = 5, misMatch = 0, gapLetter = -3, gapOpening = -0.1, gapExtension = 0, substitutionMatrix = NULL, shiftPenalty = -0.2, threshold = 0.1, weight = 1, processors = 1)

Arguments

myXStringSet

An AAStringSet, DNAStringSet, or RNAStringSet object of aligned sequences.

perfectMatch

Numeric giving the reward for aligning two matching nucleotides in the alignment. Only used for DNAStringSet or RNAStringSet inputs.

misMatch

Numeric giving the cost for aligning two mismatched nucleotides in the alignment. Only used for DNAStringSet or RNAStringSet inputs.

gapLetter

Numeric giving the cost for aligning gaps to letters. A lower value (more negative) encourages the overlapping of gaps across different sequences in the alignment.

gapOpening

Numeric giving the cost for opening or closing a gap in the alignment.

gapExtension

Numeric giving the cost for extending an open gap in the alignment.

substitutionMatrix

Either a substitution matrix representing the substitution scores for an alignment or the name of the amino acid substitution matrix to use in alignment. The latter may be one of the following: ``BLOSUM45'', ``BLOSUM50'', ``BLOSUM62'', ``BLOSUM80'', ``BLOSUM100'', ``PAM30'', ``PAM40'', ``PAM70'', ``PAM120'', ``PAM250'', or ``MIQS''. The default (NULL) will use the perfectMatch and misMatch penalties for DNA/RNA or ``MIQS'' for AA. (See examples section below.)

shiftPenalty

Numeric giving the cost for every additional position that a group of gaps is shifted.

threshold

Numeric specifying the improvement in score required to permanently apply an adjustment to the alignment.

weight

A numeric vector of weights for each sequence, or a single number implying equal weights.

processors

The number of processors to use, or NULL to automatically detect and use all available processors.

Value

An XStringSet of aligned sequences.

Details

The process of multiple sequence alignment often results in the integration of small imperfections into the final alignment. Some of these errors are obvious by-eye, which encourages manual refinement of automatically generated alignments. However, the manual refinement process is inherently subjective and time consuming. AdjustAlignment refines an existing alignment in a process similar to that which might be applied manually, but in a repeatable and must faster fashion. This function shifts all of the gaps in an alignment to the left and right to find their optimal positioning. The optimal position is defined as the position that maximizes the alignment ``score'', which is determined by the input parameters. The resulting alignment will be similar to the input alignment but with many imperfections eliminated. Note that the affine gap penalties here are different from the more flexible penalties used in AlignProfiles, and have been optimized independently.

References

ES Wright (2015) "DECIPHER: harnessing local sequence context to improve protein multiple sequence alignment". BMC Bioinformatics, doi:10.1186/s12859-015-0749-z.

Examples

Run this code

# a trivial example
aa <- AAStringSet(c("ARN-PK", "ARRP-K"))
aa
AdjustAlignment(aa)

# a real example
fas <- system.file("extdata", "Streptomyces_ITS_aligned.fas", package="DECIPHER")
dna <- readDNAStringSet(fas)
adjustedDNA <- AdjustAlignment(dna)
BrowseSeqs(adjustedDNA, highlight=1)
adjustedDNA==dna # most sequences were adjusted

Run the code above in your browser using DataLab

State of Data and AI Literacy Report 2025