This function can optimize a model's classification threshold based on a pair of model evaluation measures that balance each other, such as sensitivity-specificity, precision-recall (i.e., positive predictive power vs. sensitivity), or omission-commission, or underprediction-overprediction (Fielding & Bell 1997; Liu et al. 2011; Barbosa et al. 2013). The function plots both measures of the given pair against all thresholds with a given interval, and calculates the optimal sum, difference and mean of the two measures.
optiPair(model = NULL, obs = NULL, pred = NULL,
measures = c("Sensitivity", "Specificity"), interval = 0.01,
plot = TRUE, plot.sum = FALSE, plot.diff = FALSE, ylim = NULL,
na.rm = TRUE, exclude.zeros = TRUE, ...)
a binary-response model object of class "glm", "gam", "gbm", "randomForest" or "bart". If provided, 'obs' and 'pred' will be extracted with mod2obspred
. Alternatively, you can input the 'obs' and 'pred' arguments instead of 'model'.
alternatively to 'model' and together with 'pred', a vector of observed presences (1) and absences (0) of a binary response variable. This argument is ignored if 'model' is provided.
alternatively to 'model' and together with 'obs', a vector with the corresponding predicted values of presence probability, habitat suitability, environmental favourability or alike. Must be of the same length and in the same order as 'obs'. This argument is ignored if 'model' is provided.
a character vector of length 2 indicating the pair of measures whose curves to plot and whose combined threshold to optimize. Available measures can be obtained with 'modEvAmethods("threshMeasures")', but note that this function expects you to use two measures that counter-balance one another, such as c("Sensitivity", "Specificity") [the default], c("Omission", "Commission"), or c("Precision", "Recall").
the interval of thresholds at which to calculate the measures. The default is 0.01.
logical indicating whether or not to plot the pair of measures.
logical, whether to plot the sum (+) of both measures in the pair. Defaults to FALSE.
logical, whether to plot the difference (-) between both measures in the pair. Defaults to FALSE.
a character vector of length 2 indicating the lower and upper limits for the y axis. The default is NULL for an automatic definition of 'ylim' based on the values of the measures and their sum and/or difference if any of these are set to TRUE.
logical, whether NA values should be removed from the calculation of minimum/maximum/mean values to get the optimized measures. Defaults to TRUE.
logical, whether non-finite and zero values should be removed from the calculation of minimum/maximum/mean values to get the optimized measures. Defaults to TRUE.
additional arguments to be passed to the plot
function.
The output is a list with the following components:
a data frame with the values of the chosen pair of measures, as well as their difference, sum and mean, at each threshold.
numeric value, the minimum difference between both measures.
numeric value, the threshold that minimizes the difference between both measures.
numeric value, the maximum sum of both measures.
numeric value, the threshold that maximizes the sum of both measures.
numeric value, the maximum mean of both measures.
numeric value, the threshold that maximizes the mean of both measures.
Barbosa, A.M., Real, R., Munoz, A.-R. & Brown, J.A. (2013) New measures for assessing model equilibrium and prediction mismatch in species distribution models. Diversity and Distributions 19: 1333-1338
Fielding A.H. & Bell J.F. (1997) A review of methods for the assessment of prediction errors in conservation presence/absence models. Environmental Conservation 24: 38-49
Liu C., White M., & Newell G. (2011) Measuring and comparing the accuracy of species distribution models with presence-absence data. Ecography, 34, 232-243.
# NOT RUN {
# load sample models:
data(rotif.mods)
# choose a particular model to play with:
mod <- rotif.mods$models[[1]]
optiPair(model = mod)
optiPair(model = mod, measures = c("Precision", "Recall"))
optiPair(model = mod, measures = c("UPR", "OPR"))
optiPair(model = mod, measures = c("CCR", "F1score"))
# you can also use 'optiPair' with vectors of observed
# and predicted values, instead of a model object:
optiPair(obs = mod$y, pred = mod$fitted.values)
# }
Run the code above in your browser using DataLab