ensemble.raster creates two consensus raster layers, one based on a (weighted) average of different suitability modelling algorithms, and a second one documenting the number of modelling algorithms that predict presence of the focal organisms. Modelling algorithms include maximum entropy (MAXENT), boosted regression trees, random forests, generalized linear models (including stepwise selection of explanatory variables), generalized additive models (including stepwise selection of explanatory variables), multivariate adaptive regression splines, regression trees, artificial neural networks, flexible discriminant analysis, support vector machines, the BIOCLIM algorithm, the DOMAIN algorithm and the Mahalonobis algorithm. These sets of functions were developed in parallel with the biomod2 package, especially for inclusion of the maximum entropy algorithm, but also to allow for a more direct integration with the BiodiversityR package, more direct handling of model formulae and greater focus on mapping. Researchers and students of species distribution are strongly encouraged to familiarize themselves with all the options of the biomod2 and dismo packages.ensemble.raster(x, p, a = NULL, an = 1000, ext = NULL, k = 0, pt = NULL, at = NULL, xn = x, models.keep = FALSE,
RASTER.file.name = "Species001", RASTER.format = "raster", RASTER.datatype = "INT2S", RASTER.NAflag=-32767, RASTER.models.overwrite = TRUE,
ENSEMBLE.decay = 1, ENSEMBLE.multiply = TRUE, ENSEMBLE.best = 0,
MAXENT = 1, GBM = 1, GBMSTEP = 1, RF = 1, GLM = 1, GLMSTEP = 1, GAM = 1, GAMSTEP = 1, MGCV = 1, EARTH = 1, RPART = 1, NNET = 1, FDA = 1, SVM=1, BIOCLIM = 1, DOMAIN = 1, MAHAL = 1,
Yweights = "BIOMOD", factors = NULL, evaluation.strip = TRUE, formulae.defaults = TRUE,
MAXENT.OLD = NULL,
GBM.formula = NULL, GBM.n.trees = 3000, GBM.OLD = NULL,
GBMSTEP.gbm.x = 2:(1 + nlayers(x)), GBMSTEP.tree.complexity = 5, GBMSTEP.learning.rate = 0.01, GBMSTEP.bag.fraction = 0.5, GBMSTEP.OLD = NULL,
RF.formula = NULL, RF.ntree = 750, RF.mtry = log(nlayers(x)), RF.OLD = NULL,
GLM.formula = NULL, GLM.family = binomial(link = "logit"), GLM.OLD = NULL,
GLMSTEP.steps = 1000, STEP.formula = NULL, GLMSTEP.scope = NULL,
GLMSTEP.k = 2, GLMSTEP.OLD = NULL, GAM.formula = NULL, GAM.family = binomial(link = "logit"), GAM.OLD = NULL,
GAMSTEP.steps = 1000, GAMSTEP.scope = NULL, GAMSTEP.OLD = NULL,
MGCV.formula = NULL, MGCV.OLD = NULL,
EARTH.formula = NULL, EARTH.glm = list(family = binomial(link = "logit")), EARTH.OLD = NULL,
RPART.formula = NULL, RPART.xval = 50, RPART.OLD = NULL,
NNET.formula = NULL, NNET.size = 8, NNET.decay = 0.01, NNET.OLD = NULL,
FDA.formula = NULL, FDA.OLD = NULL,
SVM.formula = NULL, SVM.OLD= NULL,
BIOCLIM.OLD = NULL, DOMAIN.OLD = NULL, MAHAL.OLD = NULL)
ensemble.weights(weights = c(0.9, 0.8, 0.7, 0.5), decay = 1.5, multiply = TRUE, scale = TRUE, best = 0)stack) containing all layers that correspond to explanatory variables. This RasterStack object is used for calibrating the suitability models.prepareData and extractprepareData and extractrandomPoints in case argument a is missingxn and the selection of background points to a sub-region of x, typically provided as c(lonmin, lonmax, latmin, latmax); see also k=5 results in 4/5 of presence and background points to be used for calibrating the models, and 1/5 of presence and backgrounprepareDataprepareData and extract<stack) containing all layers that correspond to explanatory variables. This RasterStack object is used for creating the suitability maps corresponding to individual suitability modelTRUE). (This may be particularly useful when projecting to different possible future climates.)writeFormats and writeRaster.dataType and writeRaster.writeRaster.TRUE). (Overwriting actually implies that raster files are created or overwritten that start with "working_").TRUE, then the input weights (assumed to correspond to evaluation indices of suitability models) are multiplied by the ENSEMBLE.decay to obtain the weighting used by the consensus algorithm (based on a weighted average). If maxent). (Only weights > 0 will be used.)gbm). (Only weights > 0 will be used.)gbm.step). (Only weights > 0 will be used.)randomForest). (Only weights > 0 will be used.)glm). (Only weights > 0 will be used.)stepAIC). (Only weights > 0 will be used.)gam). (Only weights > 0 will be used.)step.gam). (Only weights > 0 will be used.)gam). (Only weights > 0 will be used.)earth). (Only weights > 0 will be used.)rpart). (Only weights > 0 will be used.)nnet). (Only weights > 0 will be used.)fda). (Only weights > 0 will be used.)ksvm). (Only weights > 0 will be used.)bioclim). (Only weights > 0 will be used.)domain). (Only weights > 0 will be used.)mahal). (Only weights > 0 will be used.)"BIOMOD" results in equal weighting of all presence and all background cases, "equal" results in equal weighting of all cases. The user can supply a vector of wprepareDataevaluation.strip.plot function (if TRUE)TRUE). See also ensemble.formulae.NULL, then do not calibrate a new maximum entropy model, but use this model insteadgbmgbmNULL, then do not calibrate a new boosted regression tree model, but use this model insteadgbm.stepgbm.stepgbm.stepgbm.stepNULL, then do not calibrate a new stepwise boosted regression tree model, but use this model insteadrandomForestrandomForestrandomForestNULL, then do not calibrate a new random forest model, but use this model insteadglmglmNULL, then do not calibrate a new generalized linear model, but use this model insteadstepAICstepAICstepAICstepAICNULL, then do not calibrate a new stepwise generalized linear model, but use this model insteadgamgamNULL, then do not calibrate a new generalized additive model, but use this model insteadstep.gamstep.gamNULL, then do not calibrate a new stepwise generalized additive model, but use this model insteadgamNULL, then do not calibrate a new generalized additive model, but use this model insteadearthNULL, then do not calibrate a new multivariate adaptive regression spline model, but use this model insteadrpartrpart.controlNULL, then do not calibrate a new recursive partioning and regression tree model, but use this model insteadnnetnnetnnetNULL, then do not calibrate a new artificial neural network model, but use this model insteadfdaNULL, then do not calibrate a new flexible discriminant model, but use this model insteadksvmNULL, then do not calibrate a new support vector machine model, but use this model insteadNULL, then do not calibrate a new BIOCLIM model, but use this model insteadNULL, then do not calibrate a new DOMAIN model, but use this model insteadNULL, then do not calibrate a new Mahalonobis model, but use this model insteadensemble.weights functionTRUE, then the input weights are multiplied by the decay to obtain the output weighting. If FALSE, then the output weighting directly corresponds to the ENSEMBLE.decay.TRUE, default for ensemble.raster)ensemble.raster mainly results in the creation of raster layers that correspond to fitted probabilities of presence of individual suitability models (in folder "models") and consensus models (in folder "ensembles"), and the number of suitability models that predict presence (in folder "ensembles"). Prediction of presence is based on a threshold defined by maximing the sum of the true presence and true absence rates (see also ModelEvaluation).
If desired by the user, the function also saves details of fitted suitability models or data that can be plotted with the evaluation.strip.plot function.ensemble.raster fits individual suitability models for all models with positive input weights. In subfolder "models" of the working directory, suitability maps for the individual suitability modelling algorithms are stored. In subfolder "ensembles", a consensus suitability map based on a weighted average of individual suitability models is stored. Also in subfolder "ensembles", a consensus suitability map (with suffix "_COUNT") based on the number of individual suitability models that predict presence of the focal organisms (typically a species, but possibly another taxonomic grouping) is stored.
Note that values in suitability maps are integer values calculated by multiplying probabilities by 1000 (see also trunc).
The ensemble.weights function is used internally by the ensemble.raster function, using the input weights for the different suitability modelling algorithms. Ties between input weights result in the same output weights.evaluation.strip.plot, ensemble.test, ensemble.test.splits# based on examples in the dismo package
# simplified example of ensemble modelling with 4 modeling algorithms
# get predictor variables
library(dismo)
predictors <- stack(list.files(path=paste(system.file(package="dismo"), '/ex', sep=''), pattern='grd', full.names=TRUE ))
# predictors
predictors <- dropLayer(predictors, "biome")
# presence points
presence <- paste(system.file(package="dismo"), '/ex/bradypus.csv', sep='')
pres <- read.table(presence, header=TRUE, sep=',')[,-1]
# choose background points
ext = extent(-90, -32, -33, 23)
background <- randomPoints(predictors, n=1000, ext=ext, extf = 1.00)
# formulae for random forest and generalized linear model
rfformula <- as.formula(pb ~ bio1+bio5+bio6+bio7+bio8+bio12+bio16+bio17)
glmformula <- as.formula(pb~bio1 + I(bio1^2) + I(bio1^3) + bio12 + I(bio12^2) + I(bio12^3) +
bio16 + I(bio16^2) + I(bio16^3) + bio17 + I(bio17^2) + I(bio17^3) +
bio5 + I(bio5^2) + I(bio5^3) + bio6 + I(bio6^2) + I(bio6^3) +
bio7 + I(bio7^2) + I(bio7^3) + bio8 + I(bio8^2) + I(bio8^3) )
# if desired, change working directory where subfolders of "models" and "ensembles" will be created
# fit four ensemble models
# weights are based on previous run of ensemble.test.splits function
# all presence points are utilized
ensemble.nofactors <- ensemble.raster(x=predictors, p=pres, a=background, k=0,
MAXENT=0, GBM=0, GBMSTEP=0, RF=0.8538, GLM=0.7948, GLMSTEP=0, GAM=0, GAMSTEP=0,
MGCV=0, EARTH=0, RPART=0, NNET=0, FDA=0, SVM=0, BIOCLIM=0.7031, DOMAIN=0.7265, MAHAL=0,
Yweights="BIOMOD", factors=NULL,
models.keep=TRUE, evaluation.strip=TRUE,
RF.formula=rfformula,
GLM.formula=glmformula)
evaluation.strip.plot(ensemble.nofactors$evaluation.strip, model="ENSEMBLE", type="o", col="red")
# split data in 5 subsets, use 1/5 for evaluation and 4/5 for calibration
# GAMSTEP option is not available at the moment (problems with assignments)
ensemble.nofactors <- ensemble.raster(x=predictors, p=pres, a=background, k=5,
MAXENT=0, GBM=1, GBMSTEP=1, RF=1, GLM=1, GLMSTEP=1, GAM=1, GAMSTEP=0, MGCV=1,
EARTH=1, RPART=1, NNET=1, FDA=1, SVM=1, BIOCLIM=1, DOMAIN=1, MAHAL=1,
Yweights="BIOMOD",
models.keep=TRUE, evaluation.strip=TRUE,
formulae.defaults=TRUE)Run the code above in your browser using DataLab