Learn R Programming

CHNOSZ (version 0.9-5)

findit: Multidimensionally Optimize Chemical Activities

Description

Use a gridded search to find a combination of one or more of chemical activities of basis species, temperature and/or pressure that maximize or minimize a target function of the metastable equilibrium chemical activities of the species of interest.

Usage

findit (lims = list(), target = "cv", n = NULL, iprotein = NULL, 
    do.plot = TRUE, T = 25, P = "Psat", res = NULL, labcex = 0.6, 
    loga.ref = NULL, as.residue = FALSE, loga.tot = 0)
  ## S3 method for class 'findit':
plot(x, which=NULL, mar=c(3.5,5,2,2), xlab="iteration", ...)

Arguments

lims
list, specification of search limits.
target
character, target statistic to optimize.
n
numeric, number of iterations.
res
numeric, grid resolution (number of points on one edge).
iprotein
numeric, indices of proteins.
do.plot
logical, make a plot?
T
numeric, temperature.
P
numeric, pressure; or character, "Psat".
labcex
numeric, character expansion for plot labels.
loga.ref
numeric, reference logarithms of activity of species.
as.residue
logical, use the activities of residues instead of proteins?
loga.tot
numeric, logarithm of total activity of residues (or other immobile component).
x
list, object of class findit.
which
numeric, which of the parameters to plot.
mar
numeric, plot margin specification.
xlab
character, x-axis label.
...
additional arguments passed to plot.

Value

  • findit returns a list having class findit with elements value (values of the parameters, and value of the target statistic, at each iteration), lolim (lower limits of the parameters) and hilim (upper limits of the parameters).

Details

findit implements a gridded optimization to find the minimum or maximum value of target as a function of one or more of the chemical activities, temperature and/or pressure whose ranges are listed in lims. Any target available in revisit can be optimized. Generally, the system (basis species and species of interest) must be set up before calling this function. If iprotein is supplied, indicating a set of proteins to use in the calculation, the definition of the species is not required.

lims is a list, each element of which contains two values indicating the range of the parameter indicated by the names of lims. The names should be formula(s) of one or more of the basis speices, T and/or P. If the latter two are missing, the calculations are performed at isothermal and/or isobaric conditions indicated by T and P.

It can be used for one, two, or more parameters. If $nd$ is the number of parameters (dimensions), default values of n and res come from the following table. These settings are selected mostly for quickness of testing the function in high-dimensional space. Detailed studies of a system might have to use more iterations and/or higher resolutions.

rrrrr{ nd n res res^nd rat 1 4 128 128 2/(1+sqrt(5)) 2 6 64 4096 2/(1+sqrt(5)) 3 6 16 4096 0.75 4 8 8 4096 0.8 5 12 6 7776 0.8 6 12 4 4096 0.85 7 12 4 16384 0.85 }

The function performs n iterations. At first, the limits of the parameters given in lims define the extent of a $nd$-dimensional box around the space of interest. The value of target is calculated at each of the $res^{nd}$ grid points and and optimum value located (see revisit and where.extreme). In the next iteration the new search box is centered on the location of the optimum value, and the edges are shrunk so their length is rat * the length in the previous step. If the limits of any of the parameters extend beyond those in lims, they are pushed in to fit (preserving the difference between them).

plot.findit plots the values of the parameters and the target statistic as a function of the number of iterations.

Examples

Run this code
data(thermo)
  # an inorganic example: sulfur species
    basis("CHNOS+")
    basis("pH",5)
    species(c("H2S","S2-2","S3-2","S2O3-2","S2O4-2","S3O6-2",
      "S5O6-2","S2O6-2","HSO3-","SO2","HSO4-"))
    # to minimize the standard deviations of the loga of the species
    target <- "sd"
    # this one gives logfO2=-27.8
    f1 <- findit(list(O2=c(-50,-15)),target,T=325,P=350,n=3)
    # this one gives logfO2=-30.0 pH=9.38
    f2 <- findit(list(O2=c(-50,-15),pH=c(0,14)),target,T=325,P=350,res=16,n=4)

    # an organic example: 
    # find chemical activities where metastable activities of
    # selected proteins in P. ubique have high correlation
    # with a lognormal distribution (i.e., maximize r of q-q plot)
    f <- system.file("extdata/HTCC1062.faa",package="CHNOSZ")
    # search for three groups of proteins
    myg <- c("ribosomal","nucle","membrane")
    g <- lapply(myg,function(x) grep.file(f,x))
    # note that some proteins match more than one search term
    uug <- unique(unlist(g))
    # read their amino acid compositions from the file
    p <- read.fasta(f,uug)
    # add these proteins to thermo$protein
    ip <- add.protein(p)
    # load a predefined set of uncharged basis species
    # (speeds things up as we won't model protein ionization)
    basis("CHNOS")
    # make colors for the diagram
    rgbargs <- lapply(1:3,function(x) as.numeric(uug %in% g[[x]]))
    col <- do.call(rgb,c(rgbargs,list(alpha=0.5)))
    # get point symbols (use 1,2,4 and their sums)
    pch <- colSums(t(list2array(rgbargs)) * c(1,2,4))
    # plot 1: calculated logarithms of chemical activity
    # as a function of logfO2 ... a bundle of curves near logfO2 = -77
    a <- affinity(O2=c(-90,-50),iprotein=ip)
    d <- diagram(a,logact=0,col=col)
    # plot 2: q-q correlation coefficient
    # it shows lognormal distribution favored near logfO2 = -73.6
    r <- revisit(d,"qqr")
    # plot 3: q-q at a single value of logfO2
    basis("O2",-73.6)
    a <- affinity(iprotein=ip)
    d <- diagram(a,logact=0,do.plot=FALSE)
    qqr3 <- revisit(d,"qqr",pch=pch)$H
    legend("topleft",pch=c(1,2,4),legend=myg)
    # plot 4: findit... maximize qqr as a function of O2-H2O-NH3-CO2
    # it shows an optimum at low logaH2O, logaNH3
    f1 <- findit(list(O2=c(-90,-70),H2O=c(-30,0),CO2=c(-20,10),NH3=c(-15,0)),
      "qqr",iprotein=ip,n=8)
    # plot 5: q-q plot at the final loga O2, H2O, CO2, NH3
    # higher correlation coefficient than plot 3
    a <- affinity(iprotein=ip)
    d <- diagram(a,logact=0,do.plot=FALSE)
    qqr5 <- revisit(d,"qqr",pch=pch)$H
    legend("topleft",pch=c(1,2,4),legend=myg)
    # plot 6: trajectory of O2, H2O, CO2, NH3, and the
    # q-q correlation coefficient in the search
    plot(f1,mar=c(2,5,1,1),mgp=c(4,1,0))

Run the code above in your browser using DataLab