util.seq: Functions to Work with Sequence Data

Description

Count amino acids in protein sequences, return one- or three-letter abbreviations of amino acids; count nucleotides in nucleic acid sequences, calculate DNA and RNA complements of nucleic acid sequences.

Usage

aminoacids(seq, nchar=1)
  nucleicacids(seq, type = "DNA", comp = NULL, comp2 = NULL)

Arguments

seq

character, amino acid sequence of a protein (aminoacids) or base sequence of a nucleic acid (nucleicacids).

nchar

numeric, $1$ to return one-letter, $3$ to return three-letter abbreviations for amino acids (aminoacids).

type

character, type of nucleic acid sequence (DNA or RNA) (nucleicads).

comp

character, type of complement sequence.

comp2

character, type of second complement sequence.

Value

An object of type character or dataframe.

Details

aminoacids takes a character argument containing a protein sequence and counts the number of occurrences of each type of amino acid. The output is a dataframe with 20 columns, each corresponding to an amino acid, ordered in the same way as thermo$protein. If the first argument is NULL, the function returns the one-letter abbreviations (for nchar equal to 1) or the three-letter ones (if nchar is equal to 3) or the names of the amino acids (if nchar is NA) of twenty amino acids in the order used in thermo$protein.

nucleicacids takes a DNA or RNA sequence and counts the numbers of bases of each type. Whether the sequence is DNA or RNA is specified by type. Setting comp to DNA or RNA tells the function to compute the base composition of that type of complement of the sequence. If comp2 is specified, another complement is taken. The two rounds of complementing can be used in a single function call e.g. to go from a sequence on DNA minus strand (given in seq) to the plus strand (with comp="DNA") and then from the DNA plus strand to RNA (with comp2="RNA"). The value returned by the function is a dataframe of base composition, which can be passed back to the function to obtain the overall chemical formula for the bases.

Examples

Run this code

data(thermo)
  ## count amino acids in a sequence
  aminoacids("GGSGG")
  aminoacids("WhatAmIMadeOf?")

  ## count nucleobases in a sequence
  nucleicacids("ACCGGGTTT")
  # the DNA complement of that sequence
  nucleicacids("ACCGGGTTT",comp="DNA")
  # the RNA complement of the DNA complement
  n <- nucleicacids("ACCGGGTTT",comp="DNA",comp2="RNA")
  # the formula of the RNA complement
  nucleicacids(n,type="RNA")

Run the code above in your browser using DataLab