BIOMOD_ModelingOptions: Configure the modeling options for each selected model

Description

Parametrize and/or tune biomod2's single models options.

Usage

BIOMOD_ModelingOptions(
  GLM = NULL,
  GBM = NULL,
  GAM = NULL,
  CTA = NULL,
  ANN = NULL,
  SRE = NULL,
  FDA = NULL,
  MARS = NULL,
  RF = NULL,
  MAXENT = NULL,
  XGBOOST = NULL
)
bm_DefaultModelingOptions()

Value

A BIOMOD.models.options object that can be used to build species distribution model(s) with the BIOMOD_Modeling function.

Arguments

GLM: (optional, default NULL)
A list containing GLM options
GBM: (optional, default NULL)
A list containing GBM options
GAM: (optional, default NULL)
A list containing GAM options
CTA: (optional, default NULL)
A list containing CTA options
ANN: (optional, default NULL)
A list containing ANN options
SRE: (optional, default NULL)
A list containing SRE options
FDA: (optional, default NULL)
A list containing FDA options
MARS: (optional, default NULL)
A list containing MARS options
RF: (optional, default NULL)
A list containing RF options
MAXENT: (optional, default NULL)
A list containing MAXENT options
XGBOOST: (optional, default NULL)
A list containing XGBOOST options

GLM

(glm)

myFormula : a typical formula object (see Examples).
If not NULL, type and interaction.level parameters are switched off.
You can choose to either :
- generate automatically the GLM formula with the following parameters :
  - type = 'quadratic' : formula given to the model, must be simple, quadratic or polynomial
  - interaction.level = 0 : an integer corresponding to the interaction level between considered variables considered (be aware that interactions quickly enlarge the number of effective variables used into the GLM !)
- or construct specific formula
test = 'AIC' : information criteria for the stepwise selection procedure, must be AIC (Akaike Information Criteria, BIC (Bayesian Information Criteria) or none (consider only the full model, no stepwise selection, but this can lead to convergence issue and strange results !)
family = binomial(link = 'logit') : a character defining the error distribution and link function to be used in the model, mus be a family name, a family function or the result of a call to a family function (see family) (so far, biomod2 only runs on presence-absence data, so binomial family is the default !)
control : a list of parameters to control the fitting process (passed to glm.control)

GBM

(default gbm)

Please refer to gbm help file for more details.

distribution = 'bernoulli'
n.trees = 2500
interaction.depth = 7
n.minobsinnode = 5
shrinkage = 0.001
bag.fraction = 0.5
train.fraction = 1
cv.folds = 3
keep.data = FALSE
verbose = FALSE
perf.method = 'cv'
n.cores = 1

GAM

(gam or gam)

algo = 'GAM_gam' : a character defining the chosen GAM function, must be GAM_gam (see gam), GAM_mgcv (see gam) or BAM_mgcv (see bam)
myFormula : a typical formula object (see Examples).
If not NULL, type and interaction.level parameters are switched off.
You can choose to either :
- generate automatically the GAM formula with the following parameters :
  - type = 's_smoother' : the smoother used to generate the formula
  - interaction.level = 0 : an integer corresponding to the interaction level between considered variables considered (be aware that interactions quickly enlarge the number of effective variables used into the GLM !)
- or construct specific formula
k = -1a smooth term in a formula argument to gam, must be -1 or 4 (see gam s or mgcv s)
family = binomial(link = 'logit') : a character defining the error distribution and link function to be used in the model, mus be a family name, a family function or the result of a call to a family function (see family) (so far, biomod2 only runs on presence-absence data, so binomial family is the default !)
control : a list of parameters to control the fitting process (passed to gam.control or gam.control)
some options specific to GAM_mgcv (ignored if algo = 'GAM_gam')
- method = 'GCV.Cp')
- optimizer = c('outer','newton')
- select = FALSE
- knots = NULL
- paramPen = NULL

CTA

(rpart)

Please refer to rpart help file for more details.

method = 'class'
parms = 'default' : if 'default', default rpart parms value are kept
cost = NULL
control : see rpart.control

ANN

(nnet)

NbCV = 5 : an integer corresponding to the number of cross-validation repetitions to find best size and decay parameters
size = NULL : an integer corresponding to the number of units in the hidden layer. If NULL then size parameter will be optimized by cross-validation based on model AUC (NbCv cross-validations ; tested size will be the following : c(2, 4, 6, 8)). It is also possible to give a vector of size values to be tested, and the one giving the best model AUC will be kept.
decay = NULL : a numeric corresponding to weight decay. If NULL then decay parameter will be optimized by cross-validation based on model AUC (NbCv cross-validations ; tested size will be the following : c(0.001, 0.01, 0.05, 0.1)). It is also possible to give a vector of decay values to be tested, and the one giving the best model AUC will be kept.
rang = 0.1 : a numeric corresponding to the initial random weights on [-rang, rang]
maxit = 200 : an integer corresponding to the maximum number of iterations

SRE

(bm_SRE)

quant = 0.025 : a numeric corresponding to the quantile of 'extreme environmental variable' removed to select species envelops

FDA

(fda)

Please refer to fda help file for more details.

method = 'mars'
add_args = NULL : a list of additional parameters to method and given to the ... options of fda function

MARS

(earth)

Please refer to earth help file for more details.

myFormula : a typical formula object (see Examples).
If not NULL, type and interaction.level parameters are switched off.
You can choose to either :
- generate automatically the MARS formula with the following parameters :
  - type = 'simple' : formula given to the model, must be simple, quadratic or polynomial
  - interaction.level = 0 : an integer corresponding to the interaction level between considered variables considered (be aware that interactions quickly enlarge the number of effective variables used into the MARS !)
- or construct specific formula
nk = NULL : an integer corresponding to the maximum number of model terms.
If NULL default MARS function value is used : max(21, 2 * nb_expl_var + 1)
penalty = 2
thresh = 0.001
nprune = NULL
pmethod = 'backward'

RF

(randomForest)

do.classif = TRUE : if TRUE random.forest classification will be computed, otherwise random.forest regression will be done
ntree = 500
mtry = 'default'
sampsize = NULL
nodesize = 5
maxnodes = NULL

MAXENT

(https://biodiversityinformatics.amnh.org/open_source/maxent/)

path_to_maxent.jar = getwd() : a character corresponding to maxent.jar file link
memory_allocated = 512 : an integer corresponding to the amount of memory (in Mo) reserved for java to run MAXENT, must be 64, 128, 256, 512, 1024... or NULL to use default java memory limitation parameter
initial_heap_size = NULL : a character initial heap space (shared memory space) allocated to java. Argument transmitted to -Xms when calling java. Used in BIOMOD_Projection but not in BIOMOD_Modeling. Values can be 1024K, 4096M, 10G ... or NULL to use default java parameter
max_heap_size = NULL : a character initial heap space (shared memory space) allocated to java. Argument transmitted to -Xmx when calling java. Used in BIOMOD_Projection but not in BIOMOD_Modeling. Must be larger than initial_heap_size. Values can be 1024K, 4096M, 10G ... or NULL to use default java parameter
background_data_dir : a character corresponding to directory path where explanatory variables are stored as ASCII files (raster format). If specified, MAXENT will generate its own background data from explanatory variables rasters (as usually done in MAXENT studies). Otherwise biomod2 pseudo-absences will be used (see BIOMOD_FormatingData)
maximumbackground : an integer corresponding to the maximum number of background data to sample if the background_data_dir parameter has been set
maximumiterations = 200 : an integer corresponding to the maximum number of iterations to do
visible = FALSE : a logical to make the MAXENT user interface available
linear = TRUE : a logical to allow linear features to be used
quadratic = TRUE : a logical to allow quadratic features to be used
product = TRUE : a logical to allow product features to be used
threshold = TRUE : a logical to allow threshold features to be used
hinge = TRUE : a logical to allow hinge features to be used
lq2lqptthreshold = 80 : an integer corresponding to the number of samples at which product and threshold features start being used
l2lqthreshold = 10 : an integer corresponding to the number of samples at which quadratic features start being used
hingethreshold = 15 : an integer corresponding to the number of samples at which hinge features start being used
beta_threshold = -1.0 : a numeric corresponding to the regularization parameter to be applied to all threshold features (negative value enables automatic setting)
beta_categorical = -1.0 : a numeric corresponding to the regularization parameter to be applied to all categorical features (negative value enables automatic setting)
beta_lqp = -1.0 : a numeric corresponding to the regularization parameter to be applied to all linear, quadratic and product features (negative value enables automatic setting)
beta_hinge = -1.0 : a numeric corresponding to the regularization parameter to be applied to all hinge features (negative value enables automatic setting)
betamultiplier = 1 : a numeric to multiply all automatic regularization parameters
(higher number gives a more spread-out distribution)
defaultprevalence = 0.5 : a numeric corresponding to the default prevalence of the species
(probability of presence at ordinary occurrence points)

XGBOOST

(default xgboost)

Please refer to xgboost help file for more details.

max.depth = 5
eta = 0.1
nrounds = 512
objective = "binary:logistic"
nthread = 1

Author

Damien Georges, Wilfried Thuiller

Details

This function allows advanced user to change some default parameters of biomod2 inner models.
10 single models are available within the package, and their options can be set with this function through list objects.

The bm_DefaultModelingOptions function prints all default parameter values for all available models.
This output can be copied and pasted to be used as is (with wanted changes) as function arguments (see Examples).

Below is the detailed list of all modifiable parameters for each available model.

Examples

Run this code

library(terra)

# Load species occurrences (6 species available)
data(DataSpecies)
head(DataSpecies)

# Select the name of the studied species
myRespName <- 'GuloGulo'

# Get corresponding presence/absence data
myResp <- as.numeric(DataSpecies[, myRespName])

# Get corresponding XY coordinates
myRespXY <- DataSpecies[, c('X_WGS84', 'Y_WGS84')]

# Load environmental variables extracted from BIOCLIM (bio_3, bio_4, bio_7, bio_11 & bio_12)
data(bioclim_current)
myExpl <- terra::rast(bioclim_current)

# \dontshow{
myExtent <- terra::ext(0,30,45,70)
myExpl <- terra::crop(myExpl, myExtent)
# }

# ---------------------------------------------------------------#
# Print default modeling options
bm_DefaultModelingOptions()

# Create default modeling options
myBiomodOptions <- BIOMOD_ModelingOptions()
myBiomodOptions

# # Part (or totality) of the print can be copied and customized
# # Below is an example to compute quadratic GLM and select best model with 'BIC' criterium
# myBiomodOptions <- BIOMOD_ModelingOptions(
#   GLM = list(type = 'quadratic',
#              interaction.level = 0,
#              myFormula = NULL,
#              test = 'BIC',
#              family = 'binomial',
#              control = glm.control(epsilon = 1e-08,
#                                    maxit = 1000,
#                                    trace = FALSE)))
# myBiomodOptions
# 
# # It is also possible to give a specific GLM formula
# myForm <- 'Sp277 ~ bio3 + log(bio10) + poly(bio16, 2) + bio19 + bio3:bio19'
# myBiomodOptions <- BIOMOD_ModelingOptions(GLM = list(myFormula = formula(myForm)))
# myBiomodOptions

Run the code above in your browser using DataLab