seqinr (version 3.6-1)

consensus: Consensus and profiles for sequence alignments

Description

This function returns a consensus using variuous methods (see details) or a profile from a sequence alignment.

Usage

consensus(matali, method = c( "majority", "threshold", "IUPAC", "profile"), 
  threshold = 0.60, warn.non.IUPAC = FALSE, type = c("DNA", "RNA"))
con(matali, method = c( "majority", "threshold", "IUPAC", "profile"), 
  threshold = 0.60, warn.non.IUPAC = FALSE, type = c("DNA", "RNA"))

Arguments

matali

an object of class alignment as returned by read.alignment, or a matrix of characters.

method

select the method to use, see details.

threshold

for the threshold method, a numeric value beteen 0 and 1 indicating the minimum relative frequency for a character to be returned as the consensus character. If none, NA is returned.

warn.non.IUPAC

for the IUPAC method this argument is passed to bma with a default value set to FALSE to avoid warnings due to gap characters in the alignment.

type

for the IUPAC method this argument is passed to bma.

Value

Either a vector of single characters with possible NA or a matrix with the method profile.

Details

"majority"

The character with the higher frequency is returned as the consensus character.

"threshold"

As above but in addition the character relative frequency must be higher than the value controled by the threshold argument. If none, NA id returned.

"IUPAC"

Make sense only for nucleic acid sequences (DNA or RNA). The consensus character is defined if possible by an IUPAC symbol by function bma. If this is not possible, when there is a gap character for instance, NA is returned.

"profile"

With this method a matrix with the count of each possible character at each position is returned.

con is a short form for consensus.

References

citation("seqinr")

See Also

See read.alignment to import alignment from files.

Examples

Run this code
# NOT RUN {
#
# Read 5 aligned DNA sequences at 42 sites:
#
  phylip <- read.alignment(file = system.file("sequences/test.phylip", 
    package = "seqinr"), format = "phylip")
#
# Show data in a matrix form:
#
  (matali <- as.matrix(phylip))
#
# With the majority rule:
#
  res <- consensus(phylip)
  stopifnot(c2s(res) == "aaaccctggccgttcagggtaaaccgtggccgggcagggtat")
#
# With a threshold:
#
  res.thr <- consensus(phylip, method = "threshold")
  res.thr[is.na(res.thr)] <- "." # change NA into dots
# stopifnot(c2s(res.thr) == "aa.c..t.gc.gtt..g..t.a.cc..ggccg.......ta.")
  stopifnot(c2s(res.thr) == "aa.cc.tggccgttcagggtaaacc.tggccgg.cagggtat")
#
# With an IUPAC summary:
#
  res.iup <- consensus(phylip, method = "IUPAC")
  stopifnot(c2s(res.iup) == "amvsbnkkgcmkkkmmgsktrmrssndkgcmrkdmmvskyaw")
  # replace 3 and 4-fold symbols by dots:
  res.iup[match(res.iup, s2c("bdhvn"), nomatch = 0) > 0] <- "."
  stopifnot(c2s(res.iup) == "am.s..kkgcmkkkmmgsktrmrss..kgcmrk.mm.skyaw")
#
# With a profile method:
#
  (res <- consensus(phylip, method = "profile"))
#
# Show the connection between the profile and some consensus:
#
  bxc <- barplot(res, col = c("green", "blue", "orange", "white", "red"), border = NA,
  space = 0, las = 2, ylab = "Base count",
  main = "Profile of a DNA sequence alignment",
  xlab = "sequence position", xaxs = "i")
  
  text(x = bxc, y = par("usr")[4],lab = res.thr, pos = 3, xpd = NA)
  text(x = bxc, y = par("usr")[1],lab = res.iup, pos = 1, xpd = NA)
# }

Run the code above in your browser using DataCamp Workspace