ConsensusSequence(myXStringSet, threshold = 0.05, ambiguity = TRUE, noConsensusChar = "+", minInformation = 0.75, ignoreNonBases = FALSE, includeTerminalGaps = FALSE, verbose = TRUE)AAStringSet, DNAStringSet, or RNAStringSet object of aligned sequences.
IUPAC_CODE_MAP.
XStringSet matching the input type with a single consensus sequence.
threshold (0.05) requires that at least 95% of sequence information will be represented by the consensus sequence. The default minInformation (0.75) specifies that at least 75% of sequences must contain the information in the consensus, otherwise the noConsensusChar is used.If ambiguity = TRUE (the default) then degeneracy codes are split between their respective bases according to the IUPAC_CODE_MAP for DNA/RNA, or AMINO_ACID_CODE for AA. For example, an ``R'' in a DNAStringSet would count as half an ``A'' and half a ``G''. If ambiguity = FALSE then degeneracy codes are not considered in forming the consensus. For an AAStringSet input, the lack of degeneracy codes generally results in ``X'' in positions with mismatches, unless the threshold is set higher than 0.05 (the default).
If includeNonBases = TRUE (the default) then gap ("-"), mask ("+"), and unknown (".") characters are counted towards the consensus, otherwise they are omitted from calculation of the consensus. Note that gap ("-") and unknown (".") characters are treated interchangeably as gaps when forming the consensus sequence. For this reason, the consensus of a position with all unknown (".") characters will be a gap ("-").
IdConsensus, Seqs2DB
dna <- DNAStringSet(c("ANGCT-","-ACCT-"))
ConsensusSequence(dna)
# returns "ANSCT-"
Run the code above in your browser using DataLab