biomod2 (version 4.2-4)

BIOMOD_ModelingOptions: Configure the modeling options for each selected model

Description

Parametrize and/or tune biomod2's single models options.

Usage

BIOMOD_ModelingOptions(
  GLM = NULL,
  GBM = NULL,
  GAM = NULL,
  CTA = NULL,
  ANN = NULL,
  SRE = NULL,
  FDA = NULL,
  MARS = NULL,
  RF = NULL,
  MAXENT = NULL,
  XGBOOST = NULL
)

bm_DefaultModelingOptions()

Value

A BIOMOD.models.options object that can be used to build species distribution model(s) with the BIOMOD_Modeling function.

Arguments

GLM

(optional, default NULL)
A list containing GLM options

GBM

(optional, default NULL)
A list containing GBM options

GAM

(optional, default NULL)
A list containing GAM options

CTA

(optional, default NULL)
A list containing CTA options

ANN

(optional, default NULL)
A list containing ANN options

SRE

(optional, default NULL)
A list containing SRE options

FDA

(optional, default NULL)
A list containing FDA options

MARS

(optional, default NULL)
A list containing MARS options

RF

(optional, default NULL)
A list containing RF options

MAXENT

(optional, default NULL)
A list containing MAXENT options

XGBOOST

(optional, default NULL)
A list containing XGBOOST options

GLM

(glm)

  • myFormula : a typical formula object (see Examples).
    If not NULL, type and interaction.level parameters are switched off.
    You can choose to either :

    • generate automatically the GLM formula with the following parameters :

      • type = 'quadratic' : formula given to the model, must be simple, quadratic or polynomial

      • interaction.level = 0 : an integer corresponding to the interaction level between considered variables considered (be aware that interactions quickly enlarge the number of effective variables used into the GLM !)

    • or construct specific formula

  • test = 'AIC' : information criteria for the stepwise selection procedure, must be AIC (Akaike Information Criteria, BIC (Bayesian Information Criteria) or none (consider only the full model, no stepwise selection, but this can lead to convergence issue and strange results !)

  • family = binomial(link = 'logit') : a character defining the error distribution and link function to be used in the model, mus be a family name, a family function or the result of a call to a family function (see family) (so far, biomod2 only runs on presence-absence data, so binomial family is the default !)

  • control : a list of parameters to control the fitting process (passed to glm.control)

GBM

(default gbm)

Please refer to gbm help file for more details.

  • distribution = 'bernoulli'

  • n.trees = 2500

  • interaction.depth = 7

  • n.minobsinnode = 5

  • shrinkage = 0.001

  • bag.fraction = 0.5

  • train.fraction = 1

  • cv.folds = 3

  • keep.data = FALSE

  • verbose = FALSE

  • perf.method = 'cv'

  • n.cores = 1

GAM

(gam or gam)

  • algo = 'GAM_gam' : a character defining the chosen GAM function, must be GAM_gam (see gam), GAM_mgcv (see gam) or BAM_mgcv (see bam)

  • myFormula : a typical formula object (see Examples).
    If not NULL, type and interaction.level parameters are switched off.
    You can choose to either :

    • generate automatically the GAM formula with the following parameters :

      • type = 's_smoother' : the smoother used to generate the formula

      • interaction.level = 0 : an integer corresponding to the interaction level between considered variables considered (be aware that interactions quickly enlarge the number of effective variables used into the GLM !)

    • or construct specific formula

  • k = -1a smooth term in a formula argument to gam, must be -1 or 4 (see gam s or mgcv s)

  • family = binomial(link = 'logit') : a character defining the error distribution and link function to be used in the model, mus be a family name, a family function or the result of a call to a family function (see family) (so far, biomod2 only runs on presence-absence data, so binomial family is the default !)

  • control : a list of parameters to control the fitting process (passed to gam.control or gam.control)

  • some options specific to GAM_mgcv (ignored if algo = 'GAM_gam')

    • method = 'GCV.Cp')

    • optimizer = c('outer','newton')

    • select = FALSE

    • knots = NULL

    • paramPen = NULL

CTA

(rpart)

Please refer to rpart help file for more details.

  • method = 'class'

  • parms = 'default' : if 'default', default rpart parms value are kept

  • cost = NULL

  • control : see rpart.control

ANN

(nnet)

  • NbCV = 5 : an integer corresponding to the number of cross-validation repetitions to find best size and decay parameters

  • size = NULL : an integer corresponding to the number of units in the hidden layer. If NULL then size parameter will be optimized by cross-validation based on model AUC (NbCv cross-validations ; tested size will be the following : c(2, 4, 6, 8)). It is also possible to give a vector of size values to be tested, and the one giving the best model AUC will be kept.

  • decay = NULL : a numeric corresponding to weight decay. If NULL then decay parameter will be optimized by cross-validation based on model AUC (NbCv cross-validations ; tested size will be the following : c(0.001, 0.01, 0.05, 0.1)). It is also possible to give a vector of decay values to be tested, and the one giving the best model AUC will be kept.

  • rang = 0.1 : a numeric corresponding to the initial random weights on [-rang, rang]

  • maxit = 200 : an integer corresponding to the maximum number of iterations

SRE

(bm_SRE)

  • quant = 0.025 : a numeric corresponding to the quantile of 'extreme environmental variable' removed to select species envelops

FDA

(fda)

Please refer to fda help file for more details.

  • method = 'mars'

  • add_args = NULL : a list of additional parameters to method and given to the ... options of fda function

MARS

(earth)

Please refer to earth help file for more details.

  • myFormula : a typical formula object (see Examples).
    If not NULL, type and interaction.level parameters are switched off.
    You can choose to either :

    • generate automatically the MARS formula with the following parameters :

      • type = 'simple' : formula given to the model, must be simple, quadratic or polynomial

      • interaction.level = 0 : an integer corresponding to the interaction level between considered variables considered (be aware that interactions quickly enlarge the number of effective variables used into the MARS !)

    • or construct specific formula

  • nk = NULL : an integer corresponding to the maximum number of model terms.
    If NULL default MARS function value is used : max(21, 2 * nb_expl_var + 1)

  • penalty = 2

  • thresh = 0.001

  • nprune = NULL

  • pmethod = 'backward'

RF

(randomForest)

  • do.classif = TRUE : if TRUE random.forest classification will be computed, otherwise random.forest regression will be done

  • ntree = 500

  • mtry = 'default'

  • sampsize = NULL

  • nodesize = 5

  • maxnodes = NULL

MAXENT

(https://biodiversityinformatics.amnh.org/open_source/maxent/)

  • path_to_maxent.jar = getwd() : a character corresponding to maxent.jar file link

  • memory_allocated = 512 : an integer corresponding to the amount of memory (in Mo) reserved for java to run MAXENT, must be 64, 128, 256, 512, 1024... or NULL to use default java memory limitation parameter

  • initial_heap_size = NULL : a character initial heap space (shared memory space) allocated to java. Argument transmitted to -Xms when calling java. Used in BIOMOD_Projection but not in BIOMOD_Modeling. Values can be 1024K, 4096M, 10G ... or NULL to use default java parameter

  • max_heap_size = NULL : a character initial heap space (shared memory space) allocated to java. Argument transmitted to -Xmx when calling java. Used in BIOMOD_Projection but not in BIOMOD_Modeling. Must be larger than initial_heap_size. Values can be 1024K, 4096M, 10G ... or NULL to use default java parameter

  • background_data_dir : a character corresponding to directory path where explanatory variables are stored as ASCII files (raster format). If specified, MAXENT will generate its own background data from explanatory variables rasters (as usually done in MAXENT studies). Otherwise biomod2 pseudo-absences will be used (see BIOMOD_FormatingData)

  • maximumbackground : an integer corresponding to the maximum number of background data to sample if the background_data_dir parameter has been set

  • maximumiterations = 200 : an integer corresponding to the maximum number of iterations to do

  • visible = FALSE : a logical to make the MAXENT user interface available

  • linear = TRUE : a logical to allow linear features to be used

  • quadratic = TRUE : a logical to allow quadratic features to be used

  • product = TRUE : a logical to allow product features to be used

  • threshold = TRUE : a logical to allow threshold features to be used

  • hinge = TRUE : a logical to allow hinge features to be used

  • lq2lqptthreshold = 80 : an integer corresponding to the number of samples at which product and threshold features start being used

  • l2lqthreshold = 10 : an integer corresponding to the number of samples at which quadratic features start being used

  • hingethreshold = 15 : an integer corresponding to the number of samples at which hinge features start being used

  • beta_threshold = -1.0 : a numeric corresponding to the regularization parameter to be applied to all threshold features (negative value enables automatic setting)

  • beta_categorical = -1.0 : a numeric corresponding to the regularization parameter to be applied to all categorical features (negative value enables automatic setting)

  • beta_lqp = -1.0 : a numeric corresponding to the regularization parameter to be applied to all linear, quadratic and product features (negative value enables automatic setting)

  • beta_hinge = -1.0 : a numeric corresponding to the regularization parameter to be applied to all hinge features (negative value enables automatic setting)

  • betamultiplier = 1 : a numeric to multiply all automatic regularization parameters
    (higher number gives a more spread-out distribution)

  • defaultprevalence = 0.5 : a numeric corresponding to the default prevalence of the species
    (probability of presence at ordinary occurrence points)

XGBOOST

(default xgboost)

Please refer to xgboost help file for more details.

  • max.depth = 5

  • eta = 0.1

  • nrounds = 512

  • objective = "binary:logistic"

  • nthread = 1

Author

Damien Georges, Wilfried Thuiller

Details

This function allows advanced user to change some default parameters of biomod2 inner models.
10 single models are available within the package, and their options can be set with this function through list objects.

The bm_DefaultModelingOptions function prints all default parameter values for all available models.
This output can be copied and pasted to be used as is (with wanted changes) as function arguments (see Examples).

Below is the detailed list of all modifiable parameters for each available model.

See Also

BIOMOD_Tuning, BIOMOD_Modeling

Other Main functions: BIOMOD_EnsembleForecasting(), BIOMOD_EnsembleModeling(), BIOMOD_FormatingData(), BIOMOD_LoadModels(), BIOMOD_Modeling(), BIOMOD_PresenceOnly(), BIOMOD_Projection(), BIOMOD_RangeSize(), BIOMOD_Tuning()

Examples

Run this code
library(terra)

# Load species occurrences (6 species available)
data(DataSpecies)
head(DataSpecies)

# Select the name of the studied species
myRespName <- 'GuloGulo'

# Get corresponding presence/absence data
myResp <- as.numeric(DataSpecies[, myRespName])

# Get corresponding XY coordinates
myRespXY <- DataSpecies[, c('X_WGS84', 'Y_WGS84')]

# Load environmental variables extracted from BIOCLIM (bio_3, bio_4, bio_7, bio_11 & bio_12)
data(bioclim_current)
myExpl <- terra::rast(bioclim_current)

# \dontshow{
myExtent <- terra::ext(0,30,45,70)
myExpl <- terra::crop(myExpl, myExtent)
# }

# ---------------------------------------------------------------#
# Print default modeling options
bm_DefaultModelingOptions()

# Create default modeling options
myBiomodOptions <- BIOMOD_ModelingOptions()
myBiomodOptions

# # Part (or totality) of the print can be copied and customized
# # Below is an example to compute quadratic GLM and select best model with 'BIC' criterium
# myBiomodOptions <- BIOMOD_ModelingOptions(
#   GLM = list(type = 'quadratic',
#              interaction.level = 0,
#              myFormula = NULL,
#              test = 'BIC',
#              family = 'binomial',
#              control = glm.control(epsilon = 1e-08,
#                                    maxit = 1000,
#                                    trace = FALSE)))
# myBiomodOptions
# 
# # It is also possible to give a specific GLM formula
# myForm <- 'Sp277 ~ bio3 + log(bio10) + poly(bio16, 2) + bio19 + bio3:bio19'
# myBiomodOptions <- BIOMOD_ModelingOptions(GLM = list(myFormula = formula(myForm)))
# myBiomodOptions


Run the code above in your browser using DataLab