Calculate chemical metrics for proteins from their amino acid compositions.
Zc(AAcomp, ...)
nO2(AAcomp, basis = "QEC", ...)
nH2O(AAcomp, basis = "QEC", terminal_H2O = 0)
GRAVY(AAcomp, ...)
pI(AAcomp, terminal_H2O = 1, ...)
MW(AAcomp, terminal_H2O = 0, ...)
pMW(AAcomp, terminal_H2O = 1, ...)
V0(AAcomp, terminal_H2O = 0, ...)
pV0(AAcomp, terminal_H2O = 1, ...)
V0g(AAcomp, ...)
Density(AAcomp, ...)
S0(AAcomp, terminal_H2O = 0, ...)
pS0(AAcomp, terminal_H2O = 1, ...)
S0g(AAcomp, ...)
SV(AAcomp, ...)
Zcg(AAcomp, ...)
nH2Og(AAcomp, ...)
nO2g(AAcomp, ...)
HC(AAcomp, ...)
NC(AAcomp, ...)
OC(AAcomp, ...)
SC(AAcomp, ...)
nC(AAcomp, ...)
pnC(AAcomp, ...)
plength(AAcomp, ...)
Cost(AAcomp, ...)
RespiratoryCost(AAcomp, ...)
FermentativeCost(AAcomp, ...)
B20Cost(AAcomp, ...)
Y20Cost(AAcomp, ...)
H11Cost(AAcomp, ...)
cplab
data frame, amino acid compositions
ignored additional arguments
character, set of basis species
numeric, number of pairs of terminal groups
Columns in AAcomp should be named with the three-letter abbreviations for the amino acids.
Case-insensitive matching matching of the abbreviations is used; e.g., Ala, ALA, ala all refer to alanine.
Metrics are normalized per amino acid residue except for Zc, pI, Density, plength, and other functions starting with p (for protein).
The contribution of protein terminal groups (-H and -OH) to residue-normalized metrics is turned off by default.
Set terminal_H2O to 1 (or to the number of polypeptide chains, if greater than one) to include their contribution.
The metrics are described below:
ZcAverage oxidation state of carbon (ZC) (Dick, 2014). This metric is independent of the choice of basis species. Note that ZC is normalized by number of carbon atoms, not by number of residues.
nO2Stoichiometric oxidation state (nO2 per residue).
The available basis species are:
QEC - glutamine, glutamic acid, cysteine, H2O, O2 (Dick et al., 2020)
QCa - glutamine, cysteine, acetic acid, H2O, O2
nH2OStoichiometric hydration state (nH2O per residue). The basis species also affect this calculation.
GRAVYGrand average of hydropathy. Values of the hydropathy index for individual amino acids are from Kyte and Doolittle (1982).
pIIsoelectric point.
The net charge for each ionizable group was pre-calculated from pH 0 to 14 at intervals of 0.01.
The isoelectric point is found as the pH where the sum of charges of all groups in the protein is closest to zero.
The pK values for the terminal groups and sidechains are taken from Bjellqvist et al. (1993) and Bjellqvist et al. (1994); note that the calculation does not implement position-specific adjustments described in the latter paper.
The number of N- and C-terminal groups is taken from terminal_H2O.
MWMolecular weight.
pMWMolecular weight per protein.
V0Standard molal volume. The values are derived from group contributions of amino acid sidechains and protein backbones (Dick et al., 2006).
pV0Standard molal volume per protein.
V0gSpecific volume (reciprocal density).
DensityDensity (MW / V0).
S0Standard molal entropy. The values are derived from group contributions of amino acid sidechains and protein backbones (Dick et al., 2006).
pS0Standard molal entropy per protein.
S0gSpecific entropy.
SVEntropy density.
ZcgCarbon oxidation state per gram.
nO2gStoichiometric oxidation state per gram.
nH2OgStoichiometric hydration state per gram.
HCH/C ratio (not counting terminal -H and -OH groups).
NCN/C ratio.
OCO/C ratio (not counting terminal -H and -OH groups).
SCS/C ratio.
nCNumber of carbon atoms per residue.
pnCNumber of carbon atoms per protein.
plengthProtein length (number of amino acid residues).
CostMetabolic cost (Akashi and Gojobori, 2002).
RespiratoryCostRespiratory cost (Wagner, 2005).
FermentativeCostFermentative cost (Wagner, 2005).
B20CostBiosynthetic cost in bacteria (Zhang et al., 2018).
Y20CostBiosynthetic cost in yeast (Zhang et al., 2018).
H11CostBiosynthetic cost in humans (Zhang et al., 2018).
... is provided to permit get or do.call constructions with the same arguments for all metrics.
For instance, a terminal_H2O argument can be suppled to either Zc or nH2O, but it only has an effect on the latter.
cplab is a list of formatted labels for each of the chemical metrics listed here.
A check in the code ensures that the names of the functions for calculating metrics and the names for labels listed cplab are identical.
Akashi H, Gojobori T. 2002. Metabolic efficiency and amino acid composition in the proteomes of Escherichia coli and Bacillus subtilis. Proceedings of the National Academy of Sciences 99(6): 3695–3700. tools:::Rd_expr_doi("10.1073/pnas.062526999")
Bjellqvist B, Hughes GJ, Pasquali C, Paquet N, Ravier F, Sanchez J-C, Frutiger S, Hochstrasser D. 1993. The focusing positions of polypeptides in immobilized pH gradients can be predicted from their amino acid sequences. Electrophoresis 14: 1023--1031. tools:::Rd_expr_doi("10.1002/elps.11501401163")
Bjellqvist B, Basse B, Olsen E, Celis JE. 1994. Reference points for comparisons of two-dimensional maps of proteins from different human cell types defined in a pH scale where isoelectric points correlate with polypeptide compositions. Electrophoresis 15: 529--539. tools:::Rd_expr_doi("10.1002/elps.1150150171")
Dick JM, LaRowe DE, Helgeson HC. 2006. Temperature, pressure, and electrochemical constraints on protein speciation: Group additivity calculation of the standard molal thermodynamic properties of ionized unfolded proteins. Biogeosciences 3(3): 311--336. tools:::Rd_expr_doi("10.5194/bg-3-311-2006")
Dick JM. 2014. Average oxidation state of carbon in proteins. J. R. Soc. Interface 11: 20131095. tools:::Rd_expr_doi("10.1098/rsif.2013.1095")
Dick JM, Yu M, Tan J. 2020. Uncovering chemical signatures of salinity gradients through compositional analysis of protein sequences. Biogeosciences 17: 6145--6162. tools:::Rd_expr_doi("10.5194/bg-17-6145-2020")
Kyte J, Doolittle RF. 1982. A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157: 105--132. tools:::Rd_expr_doi("10.1016/0022-2836(82)90515-0")
Wagner A. 2005. Energy constraints on the evolution of gene expression. Molecular Biology and Evolution 22(6): 1365–1374. tools:::Rd_expr_doi("10.1093/molbev/msi126")
Zhang H, Wang Y, Li J, Chen H, He X, Zhang H, Liang H, Lu J. 2018. Biosynthetic energy cost for amino acids decreases in cancer evolution. Nature Communications 9(1): 4124. tools:::Rd_expr_doi("10.1038/s41467-018-06461-1")
For calculation of ZC from an elemental formula (instead of amino acid composition), see the ZC function in CHNOSZ.
calc_metrics is a wrapper to calculate one or more metrics specified in an argument.
# Amino acid composition of a tripeptide (Gly-Ala-Gly)
aa <- data.frame(Ala = 1, Gly = 2)
# Calculate Zc, nH2O, and length
Zc(aa)
nH2O(aa)
plength(aa)
# Make a plot with formatted labels
plot(Zc(aa), nH2O(aa), xlab = cplab$Zc, ylab = cplab$nH2O)
Run the code above in your browser using DataLab