Learn R Programming

CHNOSZ (version 0.8)

utilities: Utility and Miscellaneous Functions

Description

Provide various utilities for the user and for other functions in CHNOSZ. Convert between strings and character objects, calculate one of Gibbs energy, enthalpy or entropy from the other two, test for ability to become numeric, write and extract parts of chemical formulas and calculate nominal carbon oxidation states of formulas, handle arguments referring to temperature, pressure, states, and equations of state, calculate protein length, count amino acids in protein sequences, calculate dP/dT and temperature of phase transitions, calculate heat capacities of unfolded proteins using an equation from the literature, initialize a new plot window using preset parameters, open a postcript file for plotting, add an axis to a plot, generate labels for plot axes and for identification of subplots (e.g., (a)) and physical and chemical conditions, add stability lines for water to a diagram, add or alter properties of species in the thermodynamic database, calculate non-ideal contributions to apparent standard molal properties, identify a conserved basis species, perform arithmetic on lists, execute multicore calculations, and run all the examples provided in CHNOSZ.

Usage

c2s(x, sep = " ")
  s2c(x, sep = NULL, keep.sep = TRUE, n = NULL, move.sep = FALSE)
  GHS(species = NULL, DG = NA, DH = NA, S = NA, T = thermo$opt$Tr)
  can.be.numeric(x)
  expand.formula(elements, makeup)
  ZC(x)
  eos.args(eos, property = NULL, T = NULL, P = NULL)
  TP.args(T = NULL, P = NULL)
  state.args(state = NULL)
  protein.length(protein)
  aminoacids(seq, nchar=1)
  MP90.cp(T, protein)
  dPdTtr(x)
  Ttr(x, P = 1, dPdT = NULL)
  thermo.plot.new(xlim, ylim, xlab, ylab, cex = par('cex'),
    mar = NULL, lwd = par('lwd'), ticks = c(1,2,3,4), 
    mgp = c(1.2, 0.3, 0), cex.axis = par('cex'), col = par('col'),
    yline = NULL, axs = "i")
  thermo.postscript(file, family = 'Helvetica', width = 8, 
    height = 6, horizontal = FALSE)
  thermo.axis(lab = 'x-axis', side = 1, line = 1.5, cex = par('cex'),
    lwd = par('lwd'), T = NULL, col = par('col'))
  axis.label(x, opt = NULL, do.state = TRUE, oldstyle = FALSE,
    do.upper = FALSE, mol = 'mol')
  describe(x = NULL, T = NULL, P = NULL, use.name = FALSE, 
    as.reaction = NULL, digits = 1)
  basis.comp(basis)
  label.plot(x, xfrac = 0.95, yfrac = 0.9, cex = 1, paren = TRUE, 
    adj = 1)
  water.lines(xaxis = 'pH', yaxis = 'Eh', T = 298.15, P = 'Psat', 
    which = c('oxidation','reduction'), logaH2O = 0, lty = 2, 
    col = par('fg'), xpoints = NULL)
  element(compound, property = c("mass","entropy"))
  mod.obigt(species, ..., missingvalues = NA)
  which.pmax(elts, na.rm = FALSE, pmin = FALSE)
  nonideal(species, proptable, IS, T)
  change(name, ...)
  add.protein(file="protein.csv")
  add.obigt(file="obigt.csv")
  which.balance(species)
  lsub(x, y)
  lsum(x, y)
  pprod(x, y)
  psum(x)
  mylapply(X, FUN, ...)
  examples()

Arguments

x
character object to convert (s2c, c2s, axis.label), or object to be tested (can.be.numeric), or numeric index of a mineral phase (dPdTtr, Ttr), or character object representing
sep
character, the separator to insert or separator(s) to match (c2s, s2c).
keep.sep
logical, retain the separator in the output (TRUE) or discard it (FALSE) (s2c).
n
numeric, maximum number of items in the character object returned by s2c.
move.sep
logical, move the kept separator to the end of the preceding item.
species
character, formula of a compound from which to calculate entropies of the elements GHS, or names of species to modify or add to the thermodynamic database (mod.obigt), or names or indices of species for which to calculate nonidea
T
numeric, temperature (K) (TP.args, lines.water, describe, MP90.cp, nonideal, GHS).
P
numeric, pressure (bar) (can also be character, Psat in TP.args).
eos
character, name of equation of state (one of hkf, mk, water).
property
character, name(s) of thermodynamic properties (eos.args, element).
state
character, name(s) of states (e.g., cr, aq).
DG
numeric, standard molal Gibbs energy of formation (GHS).
DH
numeric, standard molal enthalpy of formation.
S
numeric, standard molal molal entropy.
elements
character, name(s) of elements (expand.formula).
makeup
dataframe, elemental composition of a compound returned by makeup.
protein
character, name of protein species; numeric, species index of protein (protein.length).
seq
character, amino acid sequence of a protein (aminoacids).
nchar
numeric, $1$ to return one-letter, $3$ to return three-letter abbreviations for amino acids.
dPdT
numeric, values of (dP/dT) of phase transitions (Ttr).
xlim
numeric, limits of the x-axis (thermo.plot.new).
ylim
numeric, limits of the y-axis.
xlab
character, x-axis label.
ylab
character, y-axis label.
cex
numeric, character expansion factor for labels, also in plot.label.
mar
numeric, number of lines of margins on each side of plot.
lwd
numeric, line width.
ticks
numeric, axes on which to place ticks.
mgp
numeric, sizes of margins of plot.
cex.axis
numeric, character expansion factor for names of axes.
yline
numeric, margin line on which to plot y-axis name.
axs
character, setting for axis limit calculation
file
character, name of file (thermo.postscript),add.protein,add.obigt.
family
character, font family.
width
numeric, width of plot.
height
numeric, height of plot.
horizontal
logical, create plot in landscape mode?
opt
character or numeric, options for axis labels (axis.label).
oldstyle
logical, use previous style of axis labels?
do.state
logical, append state abbreviation to label?
do.upper
logical, use uppercase letters in axis label?
mol
character, string to use as the denominator of axis label.
use.name
logical, write names instead of formulas? (describe).
as.reaction
logical, interpret input as a reaction?
digits
numeric, how many digits to round logarithms of activities.
basis
numeric or character, species number or formula (basis.comp).
xaxis
character, description of x-axis (water.lines).
yaxis
character, description of y-axis.
which
character, which of oxidation/reduction lines to plot.
logaH2O
numeric, logarithm of the activity of water.
lty
numeric, line type.
col
character, line color (water.lines, thermo.plot.new, thermo.axis).
xpoints
numeric, plotting points on x axis.
xfrac
numeric, fractional location on x-axis for placement of label (label.plot).
yfrac
numeric, fractional location on y-axis for placement of label.
paren
logical, add parentheses around label text?
adj
numeric, parameter for text alignment.
lab
character, axis label (thermo.axis).
side
numeric, which side of plot to place axis.
line
numeric, line (distance from axis) to place axis label.
compound
character, name of element(s) or compound(s) (element).
...
character or numeric, properties of species to modify in the thermodynamic database (mod.obigt), or arguments to change that are passed to mod.obigt or mod.buffer, or additional arguments for lappa
missingvalues
numeric, values to assign to undefined properties.
elts
list, numeric vectors for which to find maximum values (in parallel) (which.pmax).
na.rm
logical, remove missing values?
pmin
logical, find minimum values instead of maximum ones.
proptable
list of dataframes of species properties (nonideal).
IS
numeric, ionic strength(s) used in nonideal calculations, mol kg$^{-1}$.
name
character or numeric, name (or numeric index) of species or name of buffer to be modified (change).
X
vector, argument for lapply or mclapply.
FUN
function, argument for lapply or mclapply.
y
list (lsub, lsum) or numeric (pprod).

Value

  • s2c, c2s and axis.label return character values. Numeric returns are made by GHS, protein.length, dPdTtr, Ttr, ZC, MP90.cp and mod.obigt. A list is return by eos.args and TP.args, and character is returned by state.args. can.be.numeric returns logical. aminoacids returns character or dataframe. lsub, lsum and pprod return lists. Functions with no (or unspecified) returns are thermo.plot.new, thermo.postscript, label.plot and water.lines.

Details

c2s joins the elements of a character object into a character object of length $1$ (a string), and s2c splits a string into elements of a character object of length $n+1$, where $n$ stands for the number of separators in the string. sep gives the separator to insert between successive items c2s or the separator(s) to find in a string (s2c). The default value of sep is a space (" ") in c2s. The default value for sep is NULL in s2c, indicating a separator at every position of x (the result in this case has length equal to nchar(x)). Argument keep.sep if TRUE (the default) instructs s2c to keep the separating values in the output, and move.sep if TRUE instructs s2c to append the kept separators to the preceding items instead of prepending them to the following ones. The maximum length of the object returned by s2c is determined by the argument named n; the default value of NULL indicates an unrestricted length.

The *.args functions are used to normalize user-input arguments; the names of states and properties are made lowercase (and substitute abbreviations in states). eos.args returns a list with elements named props, for all the properties available for the specified equations-of-state, prop for the lower-case version of property, and Prop, for the upper-case (of first letter) version of property. eos.args produces an error if one of the propertys is not in the list of available properties. (See water and subcrt for the available properties for different species.) TP.args forces T and P to equal length. This function also looks for the keyword Psat in the value of P and substitutes calculated values of the saturation vapor pressure (see water). state.args makes its argument lowercase, then transforms a, c, g, and l to aq, gas, cr, and liq, respectively.

GHS computes one of the standard molal Gibbs energy or enthalpy of formation from the elements (DG, DH) or entropy (S) at 298.15 K and 1 bar from values of the other two. If the species argument is present, it is used to calculate the entropies of the elements (Se) using element, otherwise Se is set to zero. The equation in effect can be written as $DG = DH - T * DS$, where $DS = S - Se$ and $T$ denotes the reference temperature of 298.15 K. If two of DG, DH, and S are provided, the value of the third is returned. If three are provided, the value of DG in the arguments is ignored and the calculated value of DG is returned. If none of DG, DH or S are provided, the value of Se is returned. If only one of the values is provided, an error results. Units of cal mol$^{-1}$ (DG, DH) and cal K$^{-1}$ mol$^{-1}$ (S) are assumed. It T is provided, it overrides the reference temperature which is used by default in the calculation.

can.be.numeric returns a value of TRUE or FALSE for each element of x.

expand.formula converts a 1-column dataframe representing the elemental composition of a compound (see makeup) to numeric vector, the element of which correspond to the elements given in the argument. If any of these is not present in the makeup dataframe, its coefficient is set to zero. A non-zero coefficient of an element in the makeup dataframe does not appear in the output if that element is not one of elements.

ZC returns the nominal carbon oxidation state for the chemical formula represented by x. (For discussion of nominal carbon oxidation state, see Hendrickson et al., 1970; Buvet, 1983.) If carbon is not present in the formula the result is NaN.

The argument of protein.length, if it is character, refers to the name of protein(s) (e.g., LYSC_CHICK) for which to calculate the length (number of amino acid residues). If the argument is numeric, it refers to the index of a protein species (value in thermo$species$ispecies). For a numeric argument to work, the protein information must have been previously loaded into the species list (using info). aminoacids takes a character argument containing a protein sequence and counts the number of occurrences of each type of amino acid. The output is a dataframe with 20 columns, each corresponding to an amino acid, ordered in the same way as thermo$proteins. If the first argument is NULL, return the one-letter abbreviations (for nchar equal to 1) or the three-letter ones (if nchar is equal to 3) or the names of the amino acids (if nchar is NA) of twenty amino acids in the order used in thermo$proteins.

MP90.cp takes T (one or more temperatures in $^{\circ}$C) and protein (name of protein) and returns the heat capacity of the unfolded protein using values of heat capacities of the residues taken from Makhatadze and Privalov, 1990. Those authors provided values of heat capacity at six points between 5 and 125 $^{\circ}$C; this function interpolates (using splinefun) values at other temperatures.

dPdTtr returns values of (dP/dTtr), where Ttr represents the transition temperature, of the phase transition at the high-T stability limit of the xth species in thermo$obigt (no checking is done to verify that the species represents in fact one phase of a mineral with phase transitions). dPdTtr takes account of the Clapeyron equation in the form of (dP/dTtr)=DS/DV, where DS and DV represent the changes in entropy and volume of phase transition, and are calculated using subcrt at Ttr from the standard molal entropies and volumes of the two phases involved. Using values of dPdT calculated using dPdTtr or supplied in the arguments, Ttr returns as a function of P values of the upper transition temperature of the mineral phase represented by the xth species.

thermo.plot.new sets parameters for a new plot, creates a new plot using plot.new, and adds ticks to the plot. thermo.postscript calls postscript with some custom parameters. Plot parameters (see par) including cex, mar, lwd, mgp and axs can be given, as well as a numeric vector in ticks identifying which sides of the plot receive tick marks. yline, if present, denotes the margin line (par('mgp')[1]) where the y-axis name is plotted.

axis.label returns an expression to be used for plotting an axis label, which may refer to a chemical activity or fugacity, temperature, or pressure. The first argument may be the name of one of the basis species (e.g., O2) or one of T, P, Eh, pH, pe or logK. An expression is returned that may include italic and subscripted symbols, unless oldstyle is TRUE, when labels with a simpler format (e.g. O2 (log f)) are returned. The default value of NULL of opt means to get the state from the value in thermo$opt$state (if x is the name of a basis species), or if x is T or P to get the units of temperature or pressure from nuts (which also refers to thermo$opt). do.upper, if TRUE, tells the function to print the label using uppercase letters. mol (default: mol) refers to the denominator of the units (default: molality). It is possible to write the labels differently, e.g. as specific units, by setting mol to g.

water.lines plots lines representing the oxidation and reduction stability limits of water on yaxis-xaxis diagrams, where yaxis can be Eh or O2, and xaxis can be pH or T. which controls which lines (oxidation, reduction, or both (the default)) are drawn, logaH2O (default 0) denotes the logarithm of the activity of water, lty (default 2) the line type, col (default par('fg'), the foreground color), and xpoints an optional list of points on the x axis to which to restrict the plotting (default of NULL refers to the axis limits).

label.plot adds identifying text to the plot; the value given for x is made into a label like (a). The location of the label is controlled by xfrac and yfrac (the fractional locations along the respective axes) as well as adj (the text alignment parameter, see text).

thermo.axis is used to add axes and axis labels to plots, with some default style settings (rotation of numeric labels) and conversions between oxidation-reduction scales (called by thermo.plot.new).

describe generates a textual representation of the temperature, pressure, and logarithms of activities of the basis species, given in the arguments by x (i.e. the dataframe in thermo$basis) and T and P (given in Kelvin and bar and converted by the function to those specified by nuts). The digits argument tells to what decimal place the logarithms of activities should be rounded. If any of the supplied arguments is NULL its specification is not printed in the output; T and P, if present, are prepended to the basis summary. If x instead is a dataframe representing a chemical reaction (as output by subcrt and identified by having a column named coeff), the function returns a textual summary of that reaction (i.e., showing reactants on the left, an equal sign, and products on the right; reactants and products are preceded by their reaction coefficient unless it is $1$). However, if only reactants or products can be found, or as.reaction is set to FALSE, the names or formulas of the species are printed with their coefficients and interceding plus or minus signs, as approriate. Whether the names or formulas are printed is controlled by use.name (FALSE by default), a logical vector the length of which should correspond to the number of rows in x (but is expanded to the right length if needed).

element returns a dataframe of the mass and entropy of one or more elements or formulas given in compound. The property can be mass and/or entropy. mod.obigt changes one or more of the properties of one or more species or adds species to the thermodynamic database. These changes are lost if you reload the database by calling data(thermo) or if you quit the Rsession without saving it. To modify the properties of species, give the names in the species argument and supply other arguments: if one of these arguments is state, species in those states will be updated. Additional arguments refer to the name of the property(s) to be updated and correspond to the column names of thermo$obigt (the names of the properties are matched to any part of compound column names, such as z.T). The values provided should be in the units specifed in the documentation for the thermo data object. To add species, supply the new names in species and provide an argument named formula with the corresponding chemical formulas. Additional arguments refer to any of the properties you wish to specify. Properties that are not specified are assigned the value of missingvalues which is NA by default (however if state is missing it is set to the value of thermo$opt$state). The values returned (invisible-y) by mod.obigt are the rownumbers of the affected species.

which.pmax takes a list of equal-length numeric vectors (or objects that can be coerced to numeric) in elts and returns the index of the vector holding the maximum value at each position. If na.rm is TRUE, values of NA are removed; if pmin is TRUE the function finds locations of the minimum values instead.

nonideal takes a list of dataframes (in proptable) containing the standard molal properties of the identified species. For those species whose charge (determined by the number of Z in their makeup) is not equal to zero, the values of IS are combined with Alberty's (2003) equation 3.6-1 (Debye-Huckel equation) and its derivatives, to calculate apparent molal properties at the specified ionic strength(s) and temperature(s). The lengths of IS and T supplied in the arguments should be equal to the number of rows of each dataframe in proptable, or one to use single values throughout. The apparent molal properties that can be calculated include G, H, S and Cp; any columns in the dataframes of proptable with other names are left untouched. If anything was calculated, a column named loggam (logarithm of gamma, the activity coefficient) is appended to the dataframe of species properties.

change is a wrapper function to mod.obigt and mod.buffer. The name provided in the argument refers to the name or numeric index of the species to update or add using mod.obigt, unless the name begins with an underscore character, in which case the remaining part of the name (after the underscore) is passed to mod.buffer. The arguments in ... are sent without change to the subordinate function.

add.protein and add.obigt read data from the specified file and add it to either thermo$protein or thermo$obigt, as appropriate. Both of these are functions are attempted, with the default file names, when CHNOSZ is first loaded.

which.balance returns, in order, which column(s) of species all have non-zero values. It is used by diagram and transfer to determine a conservant (i.e. basis species that are conserved in transformation reactions) if none is supplied by the user.

lsub subtracts the elements of list y from the respective ones in list x. lsum sums the respective elements of lists x and y. pprod multiplies each element of list x by the respective numeric value in y. psum sums all elements of the list x.

mylapply passes the given arguments to lapply, or to mclapply if the multicore package is loaded and the length of X is less than 20. mylapply is used in affinity (in calculations for proteins activated by the iprotein argument), abundance (in parallel operations on list elements), and aminoacids and protein.length (in counting amino acids in sequences and determining lengths of proteins).

examples takes no arguments and runs all the examples in the package using example (with ask set to FALSE).

References

Alberty, R. A., 2003. Thermodynamics of Biochemical Reactions, John Wiley & Sons, Hoboken, New Jersey, 397 p.

Buvet, R., 1983. General criteria for the fulfillment of redox reactions, in Bioelectrochemistry I: Biological Redox Reactions, Milazzo, G. and Blank, M., eds., Plenum Press, New York, p. 15-50. Hendrickson, J. B., Cram, D. J., and Hammond, G. S., 1970. Organic Chemistry, 3rd ed., McGraw-Hill, New York, 1279 p.

Makhatadze, G. I. and Privalov, P. L., 1990. Heat capacity of proteins. 1. Partial molar heat capacity of individual amino acid residues in aqueous solution: Hydration effect, J. Mol. Biol., 213, 375-384.

See Also

For some of the functions on which these utilities depend or were modeled, see paste, substr, tolower, par and text. Functions in CHNOSZ that make use of these utilities include element, makeup, plot.new and postscript.

Examples

Run this code
data(thermo)
  
  ## string to character
  s2c('hello world')
  s2c('hello world',sep=' ',keep.sep=FALSE)
  s2c('3.141592',sep=c('.','9'))
  s2c('3.141592',sep=c('.','9'),move.sep=TRUE)
  # character to string
  c2s(aminoacids())
  c2s(aminoacids(),sep='.')

  ## argument processing
  eos.args('hkf',c('g','H','S','cP','V','kT','e'))
  ## produces an error:
  eos.args('hkf',c('G','H','S','Cp','V','kT','E','Q'))
  thermo$opt$water <- 'IAPWS'  # needed for p and n in next line
  eos.args('water',c('p','u','cv','psat','rho','n','q','x','y','epsilon'))
  TP.args(c(273.15,373.15))
  TP.args(c(273.15,373.15),'Psat')
  TP.args(c(273.15,373.15),c(100,100,200,200))
  state.args(c('AQ','GAS'))
  state.args(c('a','l','liq'))

  ## converting among Gibbs, enthalpy, entropy
  GHS('H') # entropy of H (element)
  # calculate enthalpy of formation of arsenopyrite from the
  # Gibbs energy of formation and entropy				
  GHS('FeAsS',DG=-33843,S=68.5) 
  # return the value of DG calculated from DH and S
  # cf. -56687.71 from subcrt('water')
  GHS('H2O',DH=-68316.76,S=16.7123)  
 
  ## count specific elements in a formula
  t <- makeup('H2O')
  expand.formula(c('H','O'),t)
  expand.formula(c('C','H','S'),t)

  ## count amino acids in a sequence
  aminoacids('GGSGG')
  aminoacids('WhatAmIMadeOf?')

  ## calculate protein length
  protein.length('LYSC_CHICK')
  # another way to do it
  basis('CHNOS')
  species('LYSC_CHICK')
  protein.length(species()$ispecies)

  ## heat capacity as a function of temperature
  ## (Makhatadze & Privalov, 1990) units: J mol-1
  MP90.cp(c(5,25,50,75,100,125),'LYSC_CHICK')

  ## properties of phase transitions
  t <- info('enstatite')
  # (dP/dT) of transitions
  dPdTtr(t)  # first transition
  dPdTtr(t+1) # second transition
  # temperature of transitions (Ttr) as a function of P
  Ttr(t,P=c(1,10,100,1000))
  Ttr(t,P=c(1,10,100,1000))
  
  ## nominal carbon oxidation states
  ZC('CHNOSZ')
  t <- info(info('LYSC_CHICK'))
  ZC(t$formula)

  ## the stoichiometry of basis species
  basis('CHNOS')
  # in a made-up species
  basis.comp('CHNOS')
  # this one makes a warning because Z isn't
  # in our basis
  basis.comp('CHNOSZ')

  ## describing the basis species
  basis('CHNOSe')
  describe(thermo$basis)
  describe(thermo$basis,T=NULL,P=NULL)

  ## mass and entropy of compounds of elements
  element('CH4')
  element(c('CH4','H2O'),'mass')
  element('Z')   # charge
  # same mass, opposite energy as charge
  element('Z-1') # electron

  ## modify/add species
  info(t <- info('alanine','cr'))
  mod.obigt('alanine',state='cr',G=0,H=0,S=0)
  # now the values of G, H, and S are inconsistent
  # with the elemental composition of alanine
  info(t)
  # add a species
  mod.obigt('myname',formula='CHNOSZ',G=0,H=0)
  info(t <- info('myname'))
  # values of G, H, S as a function of T
  # (without any equations of state parameters)
  subcrt(t)

Run the code above in your browser using DataLab