bm_FindOptimStat: Calculate the best score according to a given evaluation method

Description

This internal biomod2 function allows the user to find the threshold to convert continuous values into binary ones leading to the best score for a given evaluation metric.

Usage

bm_FindOptimStat(
  metric.eval = "TSS",
  obs,
  fit,
  nb.thresh = 100,
  threshold = NULL,
  boyce.bg.env = NULL,
  mpa.perc = 0.9,
  k = NULL
)
get_optim_value(metric.eval)
bm_CalculateStatBin(misc, metric.eval = "TSS")
bm_CalculateStatAbun(metric.eval, obs, fit, k)

Value

A 1 row x 5 columns data.frame containing :

metric.eval : the chosen evaluation metric
cutoff : the associated cut-off used to transform the continuous values into binary
sensitivity : the sensibility obtained on fitted values with this threshold
specificity : the specificity obtained on fitted values with this threshold
best.stat : the best score obtained for the chosen evaluation metric

Arguments

metric.eval: a character corresponding to the evaluation metric to be used, must be either AUCroc, AUCprg, TSS, KAPPA, ACCURACY, BIAS, POD, FAR, POFD, SR, CSI, ETS, OR, ORSS, BOYCE, MPA (binary data), RMSE, MAE, MSE, Rsquared, Rsquared_aj, Max_error (abundance / count / relative data), Accuracy, Recall, Precision, F1 (multiclass/ordinal data)
obs: a vector of observed values (binary, 0 or 1)
fit: a vector of fitted values (continuous)
nb.thresh: an integer corresponding to the number of thresholds to be tested over the range of fitted values
threshold: (optional, default NULL)
A numeric corresponding to the threshold used to convert the given data
boyce.bg.env: (optional, default NULL)
A matrix, data.frame, SpatVector or SpatRaster object containing values of environmental variables (in columns or layers) extracted from the background (if presences are to be compared to background instead of absences or pseudo-absences selected for modeling)
Note that old format from raster and sp are still supported such as RasterStack and SpatialPointsDataFrame objects.
mpa.perc: a numeric between 0 and 1 corresponding to the percentage of correctly classified presences for Minimal Predicted Area (see ecospat.mpa() in ecospat)
k: an integer corresponding to the number of environmental variables used in the model
misc: a matrix corresponding to a contingency table

Author

Damien Georges

Details

Please refer to CAWRC website ("Methods for dichotomous forecasts") to get detailed description (simple/complex metrics).
Optimal value of each method can be obtained with the get_optim_value function.

simple

POD : Probability of detection (hit rate)
FAR : False alarm ratio
POFD : Probability of false detection (fall-out)
SR : Success ratio
ACCURACY : Accuracy (fraction correct)
BIAS : Bias score (frequency bias)

complex

AUCroc : Area Under Curve of Relative operating characteristic
AUCprg : Area Under Curve of Precision-Recall-Gain curve
TSS : True skill statistic (Hanssen and Kuipers discriminant, Peirce's skill score)
KAPPA : Cohen's Kappa (Heidke skill score)
OR : Odds Ratio
ORSS : Odds ratio skill score (Yule's Q)
CSI : Critical success index (threat score)
ETS : Equitable threat score (Gilbert skill score)

presence-only

BOYCE : Boyce index
MPA : Minimal predicted area (cutoff optimizing MPA to predict 90% of presences)

abundance / count / relative data

RMSE : Root Mean Square Error
MSE : Mean Square Error
MAE : Mean Absolute Error
Rsquared : R squared
Rsquared_aj : R squared adjusted
Max_error : Maximum error

multiclass/ordinal data

Accuracy : Accuracy
Recall : Macro average Recall
Precision : Macro average Precision
F1 : Macro F1 score

Note that if a value is given to threshold, no optimization will be done, and only the score for this threshold will be returned.

The Boyce index returns NA values for SRE models because it can not be calculated with binary predictions.
This is also the reason why some NA values might appear for GLM models if they did not converge.

References

Engler, R., Guisan, A., and Rechsteiner L. 2004. An improved approach for predicting the distribution of rare and endangered species from occurrence and pseudo-absence data. Journal of Applied Ecology, 41(2), 263-274.
Hirzel, A. H., Le Lay, G., Helfer, V., Randin, C., and Guisan, A. 2006. Evaluating the ability of habitat suitability models to predict species presences. Ecological Modelling, 199(2), 142-152.

Examples

Run this code

## Generate a binary vector
vec.a <- sample(c(0, 1), 100, replace = TRUE)

## Generate a 0-1000 vector (random drawing)
vec.b <- runif(100, min = 0, max = 1000)

## Generate a 0-1000 vector (biased drawing)
BiasedDrawing <- function(x, m1 = 300, sd1 = 200, m2 = 700, sd2 = 200) {
  return(ifelse(x < 0.5, rnorm(1, m1, sd1), rnorm(1, m2, sd2)))
}
vec.c <- sapply(vec.a, BiasedDrawing)
vec.c[which(vec.c < 0)] <- 0
vec.c[which(vec.c > 1000)] <- 1000

## Find optimal threshold for a specific evaluation metric
bm_FindOptimStat(metric.eval = 'TSS', fit = vec.b, obs = vec.a)
bm_FindOptimStat(metric.eval = 'TSS', fit = vec.c, obs = vec.a, nb.thresh = 100)
bm_FindOptimStat(metric.eval = 'TSS', fit = vec.c, obs = vec.a, threshold = 280)

Run the code above in your browser using DataLab