optimFPM: Optimization of Floating Percentile Model Parameters

Description

Calculate parameter inputs that optimize benchmark performance

Usage

optimFPM(
  data,
  paramList,
  plot = TRUE,
  FN_crit = seq(0.1, 0.9, by = 0.05),
  alpha = seq(0.05, 0.5, by = 0.05),
  simplify = TRUE,
  which = c(1, 2),
  colors = heat.colors(10),
  colsteps = 100,
  ...
)

Value

data.frame of optimized FN_crit and/or alpha values

Arguments

data: data.frame containing, at a minimum, chemical concentrations as columns and a logical Hit column classifying toxicity
paramList: character vector of column names of chemical concentration variables in data
plot: logical; whether to generate a plot to visualize the opimization results
FN_crit: numeric vector over which to optimize false negative thresholds (default = seq(0.1, 0.9, by = 0.05))
alpha: numeric vector of type-I error rate values over which to optimize (default = seq(0.05, 0.5, by = 0.05))
simplify: logical; whether to export only optimized values or the full results of optimization
which: numeric or character indicating which type of plot to generate (see Details; default = c(1, 2))
colors: values recognizable as colors - text, hexadecimal, numbers, etc. (default = heat.colors(10)).
colsteps: numeric value, number of unique colors to include in gradient (default = 100)
...: additional argument passed to FPM, chemSig, and chemSigSelect

Details

optimFPM was designed to help optimize the predictive capacity of the benchmarks generated by FPM. The default input parameters to FPM (i.e., FN_crit = 0.2 and alpha = 0.05) are arbitrary, and optimization can help to objectively establish more accurate benchmarks. Graphical output from optimFPM can also help users to understand the relationship(s) between benchmark accuracy/error, FN_crit, and alpha. We also recommend that users apply cvFPM to their data to further inform the selection of FPM input values.

Default inputs for FN_crit and alpha were selected to represent a reasonable range of values to test. Testing over both ranges will result in a two-way optimization, which can be computationally intensive. Alternatively, optimFPM can be run for one parameter at a time by specifying a single value for FN_crit or alpha. Note that inputting single values for both FN_crit and alpha will generate unhelpful results.

Two metrics are used for optimization, one based on the maximum overall reliability (i.e., highest probability of correctly predicting Hit values) and one based on minimizing the difference between the false negative and false positive rates, which represents a trade-off between under- and overconservatism.

Graphical output will differ depending on whether or not a single value is input for FN_crit or alpha. Providing a single value for one of the two arguments will generate a line graph, whereas providing longer vectors (i.e., length > 1) of inputs for both arguments will generate dot matrix plots using colors to generate a color palette and colsteps to define the granularity of the color gradient with the palette. The order of colors will be plotted from more optimal to less optimal; for example, the default of heat.colors(10) will show optimal colors as red and less optimal colors as yellower. By default, two plots will be generated, however the which argument can control whether to include either or both plots. Inputs to which are, by default, c(1, 2), but flexible character inputs also can be used, for example which = "OR" or which = "balanced". Black and gray squares indicate the optimal argument values (black for the indicated optimization metric and gray for the other metric).

Examples

Run this code

paramList = c("Cd", "Cu", "Fe", "Mn", "Ni", "Pb", "Zn")
FN_seq <- seq(0.1, 0.3, 0.05)
alpha_seq <- seq(0.05, 0.2, 0.05)
optimFPM(data = h.tristate, paramList = paramList, alpha = 0.05, FN_crit = FN_seq)
optimFPM(data = h.tristate, paramList = paramList, FN_crit = 0.2, alpha = alpha_seq)
optimFPM(data = h.tristate, paramList = paramList, alpha = alpha_seq, FN_crit = FN_seq, which = 2)