ConsensusSequence(myXStringSet, threshold = 0.05, ambiguity = TRUE, noConsensusChar = "+", minInformation = 0.75, ignoreNonBases = FALSE, includeTerminalGaps = FALSE)
AAStringSet
, DNAStringSet
, or RNAStringSet
object of aligned sequences.
IUPAC_CODE_MAP
.
XStringSet
with a single consensus sequence matching the input type.
threshold
and minInformation
. The default threshold
(0.05) means that at most 5% of sequences will not be represented by the consensus sequence at any given position. The default minInformation
(0.75) specifies that at least 75% of sequences must contain the information in the consensus, otherwise the noConsensusChar
is used. If the specified threshold
results in the choice of an ambiguity code that does not represent minInformation
fraction of the sequences, then the noConsensusChar
is used.If ambiguity = TRUE
(the default) then degeneracy codes are split between their respective bases according to the IUPAC_CODE_MAP
for DNA/RNA, or AMINO_ACID_CODE
for AA. For example, an ``R'' in a DNAStringSet
would count as half an ``A'' and half a ``G''. If ambiguity = FALSE
then degeneracy codes are not considered in forming the consensus. For an AAStringSet
input, the lack of degeneracy codes generally results in ``X'' in positions with mismatches, unless the threshold
is set higher than 0.05 (the default).
If includeNonBases = TRUE
(the default) then gap ("-"), mask ("+"), and unknown (".") characters are counted towards the consensus, otherwise they are omitted from calculation of the consensus. Note that gap ("-") and unknown (".") characters are treated interchangeably as gaps when forming the consensus sequence. For this reason, the consensus of a position with all unknown (".") characters will be a gap ("-"). Also, note that if consensus is formed between different length sequences then it will represent only the longest sequences at the end. For this reason the consensus sequence is generally based on a sequence alignment so that all of the sequences will have equal lengths.
Disambiguate
, IdConsensus
, Seqs2DB
dna <- DNAStringSet(c("ANGCT-","-ACCT-"))
ConsensusSequence(dna)
# returns "ANSCT-"
aa <- AAStringSet(c("ANQIH-", "ADELW."))
ConsensusSequence(aa)
# returns "ABZJX-"
Run the code above in your browser using DataLab