Learn R Programming

BiodiversityR (version 2.0)

ensemble.raster: Suitability mapping based on ensembles of modelling algorithms: consensus mapping

Description

The basic function ensemble.raster creates two consensus raster layers, one based on a (weighted) average of different suitability modelling algorithms, and a second one documenting the number of modelling algorithms that predict presence of the focal organisms. Modelling algorithms include maximum entropy (MAXENT), boosted regression trees, random forests, generalized linear models (including stepwise selection of explanatory variables), generalized additive models (including stepwise selection of explanatory variables), multivariate adaptive regression splines, regression trees, artificial neural networks, flexible discriminant analysis, support vector machines, the BIOCLIM algorithm, the DOMAIN algorithm and the Mahalonobis algorithm. These sets of functions were developed in parallel with the biomod2 package, especially for inclusion of the maximum entropy algorithm, but also to allow for a more direct integration with the BiodiversityR package, more direct handling of model formulae and greater focus on mapping. Researchers and students of species distribution are strongly encouraged to familiarize themselves with all the options of the biomod2 and dismo packages.

Usage

ensemble.raster(x, p, a = NULL, an = 1000, ext = NULL, k = 0, pt = NULL, at = NULL, xn = x, models.keep = FALSE, 
    RASTER.file.name = "Species001", RASTER.format = "raster", RASTER.datatype = "INT2S", RASTER.NAflag=-32767, RASTER.models.overwrite = TRUE, 
    ENSEMBLE.decay = 1, ENSEMBLE.multiply = TRUE, ENSEMBLE.best = 0, 
    MAXENT = 1, GBM = 1, GBMSTEP = 1, RF = 1, GLM = 1, GLMSTEP = 1, GAM = 1, GAMSTEP = 1, MGCV = 1, EARTH = 1, RPART = 1, NNET = 1, FDA = 1, SVM=1, BIOCLIM = 1, DOMAIN = 1, MAHAL = 1, 
    Yweights = "BIOMOD", factors = NULL, evaluation.strip = TRUE, formulae.defaults = TRUE,
    MAXENT.OLD = NULL, 
    GBM.formula = NULL, GBM.n.trees = 3000, GBM.OLD = NULL,
    GBMSTEP.gbm.x = 2:(1 + nlayers(x)), GBMSTEP.tree.complexity = 5, GBMSTEP.learning.rate = 0.01, GBMSTEP.bag.fraction = 0.5, GBMSTEP.OLD = NULL, 
    RF.formula = NULL, RF.ntree = 750, RF.mtry = log(nlayers(x)), RF.OLD = NULL, 
    GLM.formula = NULL, GLM.family = binomial(link = "logit"), GLM.OLD = NULL, 
    GLMSTEP.steps = 1000, STEP.formula = NULL, GLMSTEP.scope = NULL, 
    GLMSTEP.k = 2, GLMSTEP.OLD = NULL, GAM.formula = NULL, GAM.family = binomial(link = "logit"), GAM.OLD = NULL, 
    GAMSTEP.steps = 1000, GAMSTEP.scope = NULL, GAMSTEP.OLD = NULL, 
    MGCV.formula = NULL, MGCV.OLD = NULL, 
    EARTH.formula = NULL, EARTH.glm = list(family = binomial(link = "logit")), EARTH.OLD = NULL, 
    RPART.formula = NULL, RPART.xval = 50, RPART.OLD = NULL, 
    NNET.formula = NULL, NNET.size = 8, NNET.decay = 0.01, NNET.OLD = NULL, 
    FDA.formula = NULL, FDA.OLD = NULL, 
    SVM.formula = NULL, SVM.OLD= NULL, 
    BIOCLIM.OLD = NULL, DOMAIN.OLD = NULL, MAHAL.OLD = NULL)

ensemble.weights(weights = c(0.9, 0.8, 0.7, 0.5), decay = 1.5, multiply = TRUE, scale = TRUE, best = 0)

Arguments

x
RasterStack object (stack) containing all layers that correspond to explanatory variables. This RasterStack object is used for calibrating the suitability models.
p
presence points used for calibrating the suitability models, typically available in 2-column (lon, lat) dataframe; see also prepareData and extract
a
background points used for calibrating the suitability models, typically available in 2-column (lon, lat) dataframe; see also prepareData and extract
an
number of background points for calibration to be selected with randomPoints in case argument a is missing
ext
an Extent object to limit the prediction to a sub-region of xn and the selection of background points to a sub-region of x, typically provided as c(lonmin, lonmax, latmin, latmax); see also
k
If larger than 1, the mumber of groups to split between calibration (k-1) and evaluation (1) data sets (for example, k=5 results in 4/5 of presence and background points to be used for calibrating the models, and 1/5 of presence and backgroun
pt
presence points used for evaluating the suitability models, typically available in 2-column (lon, lat) dataframe; see also prepareData
at
background points used for calibrating the suitability models, typicall available in 2-column (lon, lat) dataframe; see also prepareData and extract<
xn
RasterStack object (stack) containing all layers that correspond to explanatory variables. This RasterStack object is used for creating the suitability maps corresponding to individual suitability model
models.keep
store the details for each suitability modelling algorithm (if TRUE). (This may be particularly useful when projecting to different possible future climates.)
RASTER.file.name
First part of the names of the raster files that will be generated.
RASTER.format
Format of the raster files that will be generated. See writeFormats and writeRaster.
RASTER.datatype
Format of the raster files that will be generated. See dataType and writeRaster.
RASTER.NAflag
Value that is used to store missing data. See writeRaster.
RASTER.models.overwrite
Overwrite the raster files that correspond to each suitability modelling algorithm (if TRUE). (Overwriting actually implies that raster files are created or overwritten that start with "working_").
ENSEMBLE.decay
Ratio by which the input weights (assumed to correspond to evaluation indices of suitability models) are multiplied after they were sorted from worst to best model based on their input weights. For example, with 4 models and a decay of 2, the input weight
ENSEMBLE.multiply
If TRUE, then the input weights (assumed to correspond to evaluation indices of suitability models) are multiplied by the ENSEMBLE.decay to obtain the weighting used by the consensus algorithm (based on a weighted average). If
ENSEMBLE.best
The number of individual suitability models to be used in the consensus suitability map (based on a weighted average). In case this parameter is smaller than 1 or larger than the number of positive input weights of individual models, then all individual s
MAXENT
Input weight for a maximum entropy model (maxent). (Only weights > 0 will be used.)
GBM
Input weight for a boosted regression trees model (gbm). (Only weights > 0 will be used.)
GBMSTEP
Input weight for a stepwise boosted regression trees model (gbm.step). (Only weights > 0 will be used.)
RF
Input weight for a random forest model (randomForest). (Only weights > 0 will be used.)
GLM
Input weight for a generalized linear model (glm). (Only weights > 0 will be used.)
GLMSTEP
Input weight for a stepwise generalized linear model (stepAIC). (Only weights > 0 will be used.)
GAM
Input weight for a generalized additive model (gam). (Only weights > 0 will be used.)
GAMSTEP
Input weight for a stepwise generalized additive model (step.gam). (Only weights > 0 will be used.)
MGCV
Input weight for a generalized additive model (gam). (Only weights > 0 will be used.)
EARTH
Input weight for a multivariate adaptive regression spline model (earth). (Only weights > 0 will be used.)
RPART
Input weight for a recursive partioning and regression tree model (rpart). (Only weights > 0 will be used.)
NNET
Input weight for an artificial neural network model (nnet). (Only weights > 0 will be used.)
FDA
Input weight for a flexible discriminant analysis model (fda). (Only weights > 0 will be used.)
SVM
Input weight for a support vector machine model (ksvm). (Only weights > 0 will be used.)
BIOCLIM
Input weight for the BIOCLIM algorithm (bioclim). (Only weights > 0 will be used.)
DOMAIN
Input weight for the DOMAIN algorithm (domain). (Only weights > 0 will be used.)
MAHAL
Input weight for the Mahalonobis algorithm (mahal). (Only weights > 0 will be used.)
Yweights
chooses how cases of presence and background (absence) are weighted; "BIOMOD" results in equal weighting of all presence and all background cases, "equal" results in equal weighting of all cases. The user can supply a vector of w
factors
vector that indicates which variables are factors; see also prepareData
evaluation.strip
Obtain information that can be plotted by the evaluation.strip.plot function (if TRUE)
formulae.defaults
Suggest formulae for most of the models (if TRUE). See also ensemble.formulae.
MAXENT.OLD
if not NULL, then do not calibrate a new maximum entropy model, but use this model instead
GBM.formula
formula for the boosted regression trees algorithm; see also gbm
GBM.n.trees
total number of trees to fit for the boosted regression trees model; see also gbm
GBM.OLD
if not NULL, then do not calibrate a new boosted regression tree model, but use this model instead
GBMSTEP.gbm.x
indices of column numbers with explanatory variables for stepwise boosted regression trees; see also gbm.step
GBMSTEP.tree.complexity
complexity of individual trees for stepwise boosted regression trees; see also gbm.step
GBMSTEP.learning.rate
weight applied to individual trees for stepwise boosted regression trees; see also gbm.step
GBMSTEP.bag.fraction
proportion of observations used in selecting variables for stepwise boosted regression trees; see also gbm.step
GBMSTEP.OLD
if not NULL, then do not calibrate a new stepwise boosted regression tree model, but use this model instead
RF.formula
formula for the random forest algorithm; see also randomForest
RF.ntree
number of trees to grow for random forest algorithm; see also randomForest
RF.mtry
number of variables randomly sampled as candidates at each split for random forest algorithm; see also randomForest
RF.OLD
if not NULL, then do not calibrate a new random forest model, but use this model instead
GLM.formula
formula for the generalized linear model; see also glm
GLM.family
description of the error distribution and link function for the generalized linear model; see also glm
GLM.OLD
if not NULL, then do not calibrate a new generalized linear model, but use this model instead
GLMSTEP.steps
maximum number of steps to be considered for stepwise generalized linear model; see also stepAIC
STEP.formula
formula for the "starting model" to be considered for stepwise generalized linear model; see also stepAIC
GLMSTEP.scope
range of models examined in the stepwise search; see also stepAIC
GLMSTEP.k
multiple of the number of degrees of freedom used for the penalty (only k = 2 gives the genuine AIC); see also stepAIC
GLMSTEP.OLD
if not NULL, then do not calibrate a new stepwise generalized linear model, but use this model instead
GAM.formula
formula for the generalized additive model; see also gam
GAM.family
description of the error distribution and link function for the generalized additive model; see also gam
GAM.OLD
if not NULL, then do not calibrate a new generalized additive model, but use this model instead
GAMSTEP.steps
maximum number of steps to be considered in the stepwise generalized additive model; see also step.gam
GAMSTEP.scope
range of models examined in the step-wise search n the stepwise generalized additive model; see also step.gam
GAMSTEP.OLD
if not NULL, then do not calibrate a new stepwise generalized additive model, but use this model instead
MGCV.formula
formula for the generalized additive model; see also gam
MGCV.OLD
if not NULL, then do not calibrate a new generalized additive model, but use this model instead
EARTH.formula
formula for the multivariate adaptive regression spline model; see also earth
EARTH.glm
list of arguments to pass on to glm; see also earth
EARTH.OLD
if not NULL, then do not calibrate a new multivariate adaptive regression spline model, but use this model instead
RPART.formula
formula for the recursive partioning and regression tree model; see also rpart
RPART.xval
number of cross-validations for the recursive partioning and regression tree model; see also rpart.control
RPART.OLD
if not NULL, then do not calibrate a new recursive partioning and regression tree model, but use this model instead
NNET.formula
formula for the artificial neural network model; see also nnet
NNET.size
number of units in the hidden layer for the artificial neural network model; see also nnet
NNET.decay
parameter of weight decay for the artificial neural network model; see also nnet
NNET.OLD
if not NULL, then do not calibrate a new artificial neural network model, but use this model instead
FDA.formula
formula for the flexible discriminant analysis model; see also fda
FDA.OLD
if not NULL, then do not calibrate a new flexible discriminant model, but use this model instead
SVM.formula
formula for the support vector machine model; see also ksvm
SVM.OLD
if not NULL, then do not calibrate a new support vector machine model, but use this model instead
BIOCLIM.OLD
if not NULL, then do not calibrate a new BIOCLIM model, but use this model instead
DOMAIN.OLD
if not NULL, then do not calibrate a new DOMAIN model, but use this model instead
MAHAL.OLD
if not NULL, then do not calibrate a new Mahalonobis model, but use this model instead
weights
input weights for the ensemble.weights function
decay
Ratio by which the input weights are multiplied after they were sorted. For example, with 4 input weights and a decay of 2, the largest input weight is multiplied by 2*2*2, the second largest input weight is multiplied by 2*2, the third largest input weig
multiply
If TRUE, then the input weights are multiplied by the decay to obtain the output weighting. If FALSE, then the output weighting directly corresponds to the ENSEMBLE.decay.
best
The number of final weights. In case this parameter is smaller than 1 or larger than the number of positive input weights of individual models, then this parameter is ignored.
scale
scale the output weights so that the sum of output weights equals 1 (if TRUE, default for ensemble.raster)

Value

  • The basic function ensemble.raster mainly results in the creation of raster layers that correspond to fitted probabilities of presence of individual suitability models (in folder "models") and consensus models (in folder "ensembles"), and the number of suitability models that predict presence (in folder "ensembles"). Prediction of presence is based on a threshold defined by maximing the sum of the true presence and true absence rates (see also ModelEvaluation). If desired by the user, the function also saves details of fitted suitability models or data that can be plotted with the evaluation.strip.plot function.

Details

The basic function ensemble.raster fits individual suitability models for all models with positive input weights. In subfolder "models" of the working directory, suitability maps for the individual suitability modelling algorithms are stored. In subfolder "ensembles", a consensus suitability map based on a weighted average of individual suitability models is stored. Also in subfolder "ensembles", a consensus suitability map (with suffix "_COUNT") based on the number of individual suitability models that predict presence of the focal organisms (typically a species, but possibly another taxonomic grouping) is stored. Note that values in suitability maps are integer values calculated by multiplying probabilities by 1000 (see also trunc). The ensemble.weights function is used internally by the ensemble.raster function, using the input weights for the different suitability modelling algorithms. Ties between input weights result in the same output weights.

References

Buisson L, Thuiller W, Casajus N, Lek S and Grenouillet G. 2010. Uncertainty in ensemble forecasting of species distribution. Global Change Biology 16: 1145-1157

See Also

evaluation.strip.plot, ensemble.test, ensemble.test.splits

Examples

Run this code
# based on examples in the dismo package
# simplified example of ensemble modelling with 4 modeling algorithms

# get predictor variables
library(dismo)
predictors <- stack(list.files(path=paste(system.file(package="dismo"), '/ex', sep=''), pattern='grd', full.names=TRUE ))
# predictors
predictors <- dropLayer(predictors, "biome")

# presence points
presence <- paste(system.file(package="dismo"), '/ex/bradypus.csv', sep='')
pres <- read.table(presence, header=TRUE, sep=',')[,-1]

# choose background points
ext = extent(-90, -32, -33, 23)
background <- randomPoints(predictors, n=1000, ext=ext, extf = 1.00)


# formulae for random forest and generalized linear model
rfformula <- as.formula(pb ~ bio1+bio5+bio6+bio7+bio8+bio12+bio16+bio17)

glmformula <- as.formula(pb~bio1 + I(bio1^2) + I(bio1^3) + bio12 + I(bio12^2) + I(bio12^3) + 
    bio16 + I(bio16^2) + I(bio16^3) + bio17 + I(bio17^2) + I(bio17^3) + 
    bio5 + I(bio5^2) + I(bio5^3) + bio6 + I(bio6^2) + I(bio6^3) + 
    bio7 + I(bio7^2) + I(bio7^3) + bio8 + I(bio8^2) + I(bio8^3) )

# if desired, change working directory where subfolders of "models" and "ensembles" will be created

# fit four ensemble models
# weights are based on previous run of ensemble.test.splits function
# all presence points are utilized
ensemble.nofactors <- ensemble.raster(x=predictors, p=pres, a=background, k=0,
    MAXENT=0, GBM=0, GBMSTEP=0, RF=0.8538, GLM=0.7948, GLMSTEP=0, GAM=0, GAMSTEP=0,
    MGCV=0, EARTH=0, RPART=0, NNET=0, FDA=0, SVM=0, BIOCLIM=0.7031, DOMAIN=0.7265, MAHAL=0,
    Yweights="BIOMOD", factors=NULL,
    models.keep=TRUE, evaluation.strip=TRUE,
    RF.formula=rfformula,
    GLM.formula=glmformula)

evaluation.strip.plot(ensemble.nofactors$evaluation.strip, model="ENSEMBLE", type="o", col="red")

# split data in 5 subsets, use 1/5 for evaluation and 4/5 for calibration
# GAMSTEP option is not available at the moment (problems with assignments)

ensemble.nofactors <- ensemble.raster(x=predictors, p=pres, a=background, k=5,
    MAXENT=0, GBM=1, GBMSTEP=1, RF=1, GLM=1, GLMSTEP=1, GAM=1, GAMSTEP=0, MGCV=1, 
    EARTH=1, RPART=1, NNET=1, FDA=1, SVM=1, BIOCLIM=1, DOMAIN=1, MAHAL=1,
    Yweights="BIOMOD",
    models.keep=TRUE, evaluation.strip=TRUE,
    formulae.defaults=TRUE)

Run the code above in your browser using DataLab