ConsensusSequence(myXStringSet, threshold = 0.05, ambiguity = TRUE, noConsensusChar = "+", minInformation = 0.75, ignoreNonBases = FALSE, includeTerminalGaps = FALSE, verbose = TRUE)
AAStringSet
, DNAStringSet
, or RNAStringSet
object of aligned sequences.
IUPAC_CODE_MAP
.
XStringSet
matching the input type with a single consensus sequence.
threshold
(0.05) requires that at least 95% of sequence information will be represented by the consensus sequence. The default minInformation
(0.75) specifies that at least 75% of sequences must contain the information in the consensus, otherwise the noConsensusChar
is used.If ambiguity = TRUE
(the default) then degeneracy codes are split between their respective bases according to the IUPAC_CODE_MAP
for DNA/RNA, or AMINO_ACID_CODE
for AA. For example, an ``R'' in a DNAStringSet
would count as half an ``A'' and half a ``G''. If ambiguity = FALSE
then degeneracy codes are not considered in forming the consensus. For an AAStringSet
input, the lack of degeneracy codes generally results in ``X'' in positions with mismatches, unless the threshold
is set higher than 0.05 (the default).
If includeNonBases = TRUE
(the default) then gap ("-"), mask ("+"), and unknown (".") characters are counted towards the consensus, otherwise they are omitted from calculation of the consensus. Note that gap ("-") and unknown (".") characters are treated interchangeably as gaps when forming the consensus sequence. For this reason, the consensus of a position with all unknown (".") characters will be a gap ("-").
IdConsensus
, Seqs2DB
dna <- DNAStringSet(c("ANGCT-","-ACCT-"))
ConsensusSequence(dna)
# returns "ANSCT-"
Run the code above in your browser using DataLab