BIOMOD_EnsembleModeling: Create and evaluate an ensemble set of models and predictions

Description

BIOMOD_EnsembleModeling combines models and make ensemble predictions built with BIOMOD_Modeling. The ensemble predictions can also be evaluated against the original data given to BIOMOD_Modeling. Biomod2 proposes a range of options to build ensemble models and predictions and to assess the modeling uncertainty. The created ensemble models can then be used to project distributions over space and time as classical biomod2 models.

Usage

BIOMOD_EnsembleModeling( modeling.output,
                         chosen.models = 'all',
                         em.by = 'all',
                         eval.metric = 'all',
                         eval.metric.quality.threshold = NULL,
                         prob.mean = TRUE,
                         prob.cv = FALSE,
                         prob.ci = FALSE,
                         prob.ci.alpha = 0.05,
                         prob.median = FALSE,
                         committee.averaging = FALSE,
                         prob.mean.weight = FALSE,
                         prob.mean.weight.decay = 'proportional',
                         VarImport = 0)

Arguments

modeling.output

a "BIOMOD.models.out" returned by BIOMOD_Modeling

chosen.models

a character vector (either 'all' or a sub-selection of model names) that defines the models kept for building the ensemble models (might be useful for removing some non-preferred models)

em.by

Character. Flag defining the way the models will be combined to build the ensemble models. Available values are 'PA_dataset+repet' (default), 'PA_dataset+algo', 'PA_dataset', 'algo' and 'all'

eval.metric

vector of names of evaluation metric. If 'all', the same evaluation metrics than those of modeling.output will be automatically selected

eval.metric.quality.threshold

If not NULL, the minimum scores below which models will be excluded of the ensemble-models building.

prob.mean

Logical. Estimate the mean probabilities across predictions

prob.cv

Logical. Estimate the coefficient of variation across predictions

prob.ci

Logical . Estimate the confidence interval around the prob.mean

prob.ci.alpha

Numeric. Significance level for estimating the confidence interval. Default = 0.05

prob.median

Logical. Estimate the mediane of probabilities

committee.averaging

Logical. Estimate the committee averaging across predictions

prob.mean.weight

Logical. Estimate the weighted sum of probabilities

prob.mean.weight.decay

Define the relative importance of the weights. A high value will strongly discriminate the 'good' models from the 'bad' ones (see the details section). If the value of this parameter is set to 'proportional' (default), then the attributed weights are prop

VarImport

Number of permutation to estimate variable importance

Value

A "BIOMOD.EnsembleModeling.out". This object will be later given to BIOMOD_EnsembleForecasting if you want to make some projections of this ensemble-models.
You can access to evaluation scores with the get_evaluations function and to the built models names with the get_built_models function (see example).

item

Evaluation metrics
to make the binary transformation needed for committee averaging computation
to weight the models in the probability weighted mean model
to test (and/or evaluate) your ensemble-models forecasting ability (at this step, each ensemble-model (ensemble will be evaluated according to each evaluation metric)
eval.metric.quality.threshold
Ensemble-models algorithms
Coefficient of variation of Probabilities (prob.cv)
Confidence interval (prob.ci & prob.ci.alpha)
The lower one (there is less than a 100*prob.ci.alpha/2 % of chance to get probabilities lower the than given ones)
Median of probabilities (prob.median)
Models committee averaging (committee.averaging)
Weighted mean of probabilities (prob.mean.weight & prob.mean.weight.decay)

itemize

The upper one (there is less than a 100*prob.ci.alpha/2 \% of chance to get probabilities upper than the given ones)

code

prob.mean.weight.decay

enumerate

Mean of probabilities (prob.mean)

deqn

$$I_c = [ \bar{x} - \frac{t_\alpha sd }{ \sqrt{n} }; \bar{x} + \frac{t_\alpha sd }{ \sqrt{n} }]$$

sQuote

testing dataset

Details

Models sub-selection (chosen.models)

{

Useful to exclude some models that have been selected in the previous steps (modeling.output). This vector of model names can be access applying get_built_models to your modeling.output data. It makes easier the selection of models. The default value (i.e. all) will kept all available models.} Models assembly rules (em.by){ Please refer to ../doc/EnsembleModelingAssembly.pdf{EnsembleModelingAssembly} vignette that is dedicated to this parameter. 5 different ways to combine models can be considered. You can make ensemble models considering :

Dataset used for models building (Pseudo Absences dataset and repetitions done)

{: 'PA_dataset+repet'} Dataset used and statistical models{: 'PA_dataset+algo'} Pseudo-absences selection dataset{: 'PA_dataset'} Statistical models{: 'algo'} A total consensus model{: 'all'} } The value chosen for this parameter will control the number of ensemble models built. If no evaluation data was given the at BIOMOD_FormatingData step, some ensemble models evaluation may be a bit unfair because the data that will be used for evaluating ensemble models could differ from those used for evaluate BIOMOD_Modeling models (in particular, some data used for 'basal models' calibration can be re-used for ensemble models evaluation). You have to keep it in mind ! (../doc/EnsembleModelingAssembly.pdf{EnsembleModelingAssembly} vignette for extra details)

Examples

Run this code

# species occurrences
DataSpecies <- read.csv(system.file("external/species/mammals_table.csv",
                                    package="biomod2"), row.names = 1)
head(DataSpecies)

# the name of studied species
myRespName <- 'GuloGulo'

# the presence/absences data for our species 
myResp <- as.numeric(DataSpecies[,myRespName])

# the XY coordinates of species data
myRespXY <- DataSpecies[,c("X_WGS84","Y_WGS84")]


# Environmental variables extracted from BIOCLIM (bio_3, bio_4, bio_7, bio_11 & bio_12)
myExpl = stack( system.file( "external/bioclim/current/bio3.grd", 
                     package="biomod2"),
                system.file( "external/bioclim/current/bio4.grd", 
                             package="biomod2"), 
                system.file( "external/bioclim/current/bio7.grd", 
                             package="biomod2"),  
                system.file( "external/bioclim/current/bio11.grd", 
                             package="biomod2"), 
                system.file( "external/bioclim/current/bio12.grd", 
                             package="biomod2"))

# 1. Formatting Data
myBiomodData <- BIOMOD_FormatingData(resp.var = myResp,
                                     expl.var = myExpl,
                                     resp.xy = myRespXY,
                                     resp.name = myRespName)
       
# 2. Defining Models Options using default options.
myBiomodOption <- BIOMOD_ModelingOptions()

# 3. Doing Modelisation

myBiomodModelOut <- BIOMOD_Modeling( myBiomodData, 
                                       models = c('SRE','CTA','RF'), 
                                       models.options = myBiomodOption, 
                                       NbRunEval=1, 
                                       DataSplit=80, 
                                       Yweights=NULL, 
                                       VarImport=3, 
                                       models.eval.meth = c('TSS'),
                                       SaveObj = TRUE,
                                       rescal.all.models = FALSE,
                                       do.full.models = FALSE)
                                       
# 4. Doing Ensemble Modelling
myBiomodEM <- BIOMOD_EnsembleModeling( modeling.output = myBiomodModelOut,
                           chosen.models = 'all',
                           em.by = 'all',
                           eval.metric = c('TSS'),
                           eval.metric.quality.threshold = c(0.7),
                           prob.mean = TRUE,
                           prob.cv = FALSE,
                           prob.ci = FALSE,
                           prob.ci.alpha = 0.05,
                           prob.median = FALSE,
                           committee.averaging = FALSE,
                           prob.mean.weight = TRUE,
                           prob.mean.weight.decay = 'proportional' )   
                                       
# print summary
myBiomodEM

# get evaluation scores
get_evaluations(myBiomodEM)

Run the code above in your browser using DataLab