Learn R Programming

biomod2 (version 4.1-2)

BIOMOD_CrossValidation: Custom models cross-validation procedure

Description

This function creates a DataSplitTable that can be given as parameter to the BIOMOD_Modeling function to evaluate models with repeated k-fold or stratified cross-validation (CV) instead of repeated split samples.

Usage

BIOMOD_CrossValidation(
  bm.format,
  k = 5,
  nb.rep = 5,
  do.stratification = FALSE,
  method = "both",
  balance = "presences",
  do.full.models = TRUE
)

Value

A DataSplitTable

matrix with k * nb.rep (+ 1 if do.full.models = TRUE) columns that can be given as parameter to the BIOMOD_Modeling function.

Arguments

bm.format

a BIOMOD.formated.data-class or BIOMOD.formated.data.PA-class object returned by the BIOMOD_FormatingData function

k

an integer corresponding to the number of bins/partitions for k-fold CV

nb.rep

an integer corresponding to the number of repetitions of k-fold CV (set to 1 if do.stratification = TRUE)

do.stratification

a logical defining whether stratified CV should be run

method

a character corresponding to the CV stratification method (if do.stratification = TRUE), must be x, y, both, block or the name of a predictor for environmental stratified CV

balance

a character defining whether partitions should be balanced for presences or absences (resp. pseudo-absences or background)

do.full.models

(optional, default TRUE)
A logical value defining whether models should be also calibrated and validated over the whole dataset or not

Author

Frank Breiner

Details

Stratified cross-validation may be used to test for model overfitting and to assess transferability in geographic and environmental space :

  • x and y stratification was described in Wenger and Olden 2012 (see References). While y stratification uses k partitions along the y-gradient, x stratification does the same for the x-gradient, and both combines them.

  • block stratification was described in Muscarella et al. 2014 (see References). Four bins of equal size are partitioned (bottom-left, bottom-right, top-left and top-right).

If balance = 'presences', presences are divided (balanced) equally over the partitions (e.g. Fig. 1b in Muscarelly et al. 2014). Pseudo-absences will however be unbalanced over the partitions especially if the presences are clumped on an edge of the study area.

If balance = 'absences', absences (resp. pseudo-absences or background) are divided (balanced) as equally as possible between the partitions (geographical balanced bins given that absences are spread over the study area equally, approach similar to Fig. 1 in Wenger et Olden 2012). Presences will however be unbalanced over the partitions especially if the presences are clumped on an edge of the study area.

References

  • Muscarella, R., Galante, P.J., Soley-Guardia, M., Boria, R.A., Kass, J.M., Uriarte, M. & Anderson, R.P. (2014). ENMeval: An R package for conducting spatially independent evaluations and estimating optimal model complexity for Maxent ecological niche models. Methods in Ecology and Evolution, 5, 1198-1205.

  • Wenger, S.J. & Olden, J.D. (2012). Assessing transferability of ecological models: an underappreciated aspect of statistical validation. Methods in Ecology and Evolution, 3, 260-267.

See Also

get.block, kfold, BIOMOD_FormatingData, BIOMOD_Modeling

Other Main functions: BIOMOD_EnsembleForecasting(), BIOMOD_EnsembleModeling(), BIOMOD_FormatingData(), BIOMOD_LoadModels(), BIOMOD_ModelingOptions(), BIOMOD_Modeling(), BIOMOD_PresenceOnly(), BIOMOD_Projection(), BIOMOD_RangeSize(), BIOMOD_Tuning()

Examples

Run this code

# Load species occurrences (6 species available)
myFile <- system.file('external/species/mammals_table.csv', package = 'biomod2')
DataSpecies <- read.csv(myFile, row.names = 1)
head(DataSpecies)

# Select the name of the studied species
myRespName <- 'GuloGulo'

# Get corresponding presence/absence data
myResp <- as.numeric(DataSpecies[, myRespName])

# Get corresponding XY coordinates
myRespXY <- DataSpecies[, c('X_WGS84', 'Y_WGS84')]

# Load environmental variables extracted from BIOCLIM (bio_3, bio_4, bio_7, bio_11 & bio_12)
myFiles <- paste0('external/bioclim/current/bio', c(3, 4, 7, 11, 12), '.grd')
myExpl <- raster::stack(system.file(myFiles, package = 'biomod2'))

# \dontshow{
myExtent <- raster::extent(0,30,45,70)
myExpl <- raster::stack(raster::crop(myExpl, myExtent))
# }

# ---------------------------------------------------------------
# Format Data with true absences
myBiomodData <- BIOMOD_FormatingData(resp.var = myResp,
                                     expl.var = myExpl,
                                     resp.xy = myRespXY,
                                     resp.name = myRespName)

# Create default modeling options
myBiomodOptions <- BIOMOD_ModelingOptions()

 
# ---------------------------------------------------------------
# Create the different validation datasets
myBiomodCV <- BIOMOD_CrossValidation(bm.format = myBiomodData)
head(myBiomodCV)

# Several validation strategies can be combined
DataSplitTable.b <- BIOMOD_CrossValidation(bm.format = myBiomodData,
                                           k = 5,
                                           nb.rep = 2,
                                           do.full.models = FALSE)
DataSplitTable.y <- BIOMOD_CrossValidation(bm.format = myBiomodData,
                                           k = 2,
                                           do.stratification = TRUE,
                                           method = "y")
colnames(DataSplitTable.y)[1:2] <- c("RUN11", "RUN12")
myBiomodCV <- cbind(DataSplitTable.b, DataSplitTable.y)
head(myBiomodCV)

# Model single models
myBiomodModelOut <- BIOMOD_Modeling(bm.format = myBiomodData,
                                    modeling.id = 'mod.CV',
                                    models = c('RF', 'GLM'),
                                    bm.options = myBiomodOptions,
                                    nb.rep = 2,
                                    data.split.table = myBiomodCV,
                                    metric.eval = c('TSS','ROC'),
                                    var.import = 3,
                                    do.full.models = FALSE,
                                    seed.val = 42)

# Get evaluation scores & variables importance
myEval <- get_evaluations(myBiomodModelOut, as.data.frame = TRUE)
myEval$CV.strategy <- "Random"
myEval$CV.strategy[grepl("13", myEval$Model.name)] <- "Full"
myEval$CV.strategy[grepl("11|12", myEval$Model.name)] <- "Stratified"
head(myEval)

boxplot(myEval$Testing.data ~ interaction(myEval$Algo, myEval$CV.strategy),
        xlab = "", ylab = "ROC AUC", col = rep(c("brown", "cadetblue"), 3))



Run the code above in your browser using DataLab