An AAStringSet, DNAStringSet, or RNAStringSet object of unaligned sequences.
guideTree
Either NULL or a data.frame giving the ordered tree structure in which to align profiles. If NULL then a guide tree will be constructed.
orient
Logical specifying whether some sequences may need to be reoriented before alignment. If TRUE, an attempt to determine the best orientation (reverse and/or complement) will be performed with sequences reoriented as necessary to match the orientation of the longest sequence. Not applicable for an AAStringSet input.
processors
The number of processors to use, or NULL (the default) for all available processors.
verbose
Logical indicating whether to display progress.
...
Further arguments to be passed directly to AlignProfiles, including perfectMatch, misMatch, gapOpening, gapExtension, terminalGap, restrict, anchor, and substitutionMatrix.
Value
An XStringSet of aligned sequences.
Details
The profile-to-profile method aligns a sequence set by merging profiles along a guide tree until all sequences are aligned. If guideTree=NULL, an initial UPGMA guide tree is constructed based on a distance matrix of shared k-mers. A second guide tree is built based on the initial alignment, and the alignment is refined using this tree. If a guideTree is provided then sequences are only aligned once. The guideTree should be provided in the output given by IdClusters with ascending levels of cutoff.
For an AAStringSet input, the substitutionMatrix, gapExtension, gapOpening, and terminalGap parameters are adjusted along the guideTree to maximize alignment quality. If a substitutionMatrix or guideTree is provided then the default parameters of AlignProfiles are used, unless they are specified.