Digest: Predict Peptides Resulting from Enzymatic Digest

Description

Cleave an amino acid sequence (a protein or peptide) according to enzyme specific rules and calculate the precursor ion m/z values.

Usage

Digest(sequence, enzyme = "trypsin", missed = 0, IAA = TRUE, 
       N15 = FALSE, custom = list())

Value

A data frame with the following column names.

peptide: resulting peptides.
start: beginning residue positions in the the original sequence.
end: ending residue positions in the the original sequence.
mc: number of missed cleavages.
mz1: monoisotopic m/z values for the \([M + H]^{1+}\) ions (where M is the precursor mass).
mz2: monoisotopic m/z values for the \([M + 2H]^{2+}\) ions.
mz3: monoisotopic m/z values for the \([M + 3H]^{3+}\) ions.

Arguments

sequence: a character string representing the amino acid sequence to be cleaved by the enzyme.
enzyme: a character string specifying the rules for cleavage. Options are "trypsin" (default), "trypsin.strict" (see details), or "pepsin".
missed: the maximum number of missed cleavages. Must be an integer of 0 (default) or greater. An error will result if the specified number of missed cleavages is greater than the maximum possible number of missed cleavages.
IAA: logical. TRUE specifies iodoacetylated cysteine and FALSE specifies unmodified cystine.
N15: logical indicating if the nitrogen-15 isotope should be used in place of the default nitrogen-14 isotope.
custom: a list specifying user defined residues as custom = list(code, mass), where code is a vector of one letter characters and mass is a vector of the respective monoisotopic masses. See Details and Examples.

Author

Nathan G. Dodder

Details

The amino acid residues must be specified by one letter codes. The predefined residues are:

A = alanine	L = leucine
R = arginine	K = lysine
N = asparagine	M = methionine
D = aspartic acid	F = phenylalanine
C = cysteine	P = proline
E = glutamic acid	S = serine
Q = glutamine	T = threonine
G = glycine	W = tryptophan
H = histidine	Y = tyrosine
I = isoleucine	V = valine

If "trypsin" is specified, the sequence is cleaved on the c-terminal side of K and R residues, except if K or R is followed by P. If "trypsin.strict" is specified, the sequence is cleaved on the c-terminal side of K and R residues. If "pepsin" is specified, the sequence is cleaved on the c-terminal side of F, L, W, Y, A, E, and Q residues. This rule is specific to pepsin at pH \(>\) 2, as used in hydrogen-deuterium exchange experiments.

When "trypsin" is specified, KP and RP are not considered missed cleavages when missed is \(>\) 0.

The argument IAA specifies treatment of the protein with iodoacetamide. This treatment produces iodoacetylated cysteine residues (elemental formula C5H8N2O2S).

If TRUE, the argument N15 specifies 100% nitrogen-15 incorporation. It is intended for proteins grown with a nitrogen-15 labeled food source. (Although the experiment itself may grow a protein with less than 100% nitrogen-15 incorporation). Setting N15 = TRUE does not modify the mass of a custom residue, or the mass of the nitrogen(s) added if IAA = TRUE.

If a custom residue code is identical to a predefined residue code, the custom residue mass will be used in place of the predefined mass.

The error message “object "mass" not found” indicates the input sequence contains an undefined residue(s).

References

The relative atomic masses of the isotopes are from the NIST Physical Reference Data Website http://physics.nist.gov/PhysRefData/Compositions/. The molar mass of a proton (H+) is from the NIST CODATA Website http://physics.nist.gov/cuu/Constants/index.html.

Examples

Run this code

## digest human serum albumin with 0 and 1 missed cleavages
Digest(example.sequence, missed = 1)

## digest human serum albumin with a phosphoserine at position 58
## and all methionines oxidized
modifiedHsaSequence <- strsplit(example.sequence, split = "")[[1]]
modifiedHsaSequence[58] <- "s"   # insert code for phosphoserine  
modifiedHsaSequence <- paste(modifiedHsaSequence, collapse = "")
Digest(modifiedHsaSequence, custom = list(code = c("s","M"), 
       mass = c(MonoisotopicMass(list(C=3, H=6, N=1, O=5, P=1)),
                MonoisotopicMass(list(C=5, H=9, N=1, O=2, S=1)))))
                
## digest human serum albumin with strict rules
Digest(example.sequence, enzyme = "trypsin.strict")

Run the code above in your browser using DataLab