ensemble.dummy.variables: Suitability mapping based on ensembles of modelling algorithms: handling of categorical data

Description

The basic function ensemble.dummy.variables creates new raster layers representing dummy variables (coded 0 or 1) for all or the most frequent levels of a caterogical variable. Sometimes the creation of dummy variables is needed for proper handling of categorical data for some of the suitability modelling algorithms.

Usage

ensemble.dummy.variables(xcat=NULL,  freq.min=50, most.frequent=5, overwrite=TRUE, ...)
ensemble.accepted.categories(xcat = NULL, categories = NULL,  filename=NULL, overwrite=TRUE, ...)
ensemble.simplified.categories(xcat = NULL, p = NULL,  filename=NULL, overwrite=TRUE, ...)

Arguments

xcat

RasterLayer object (raster) containing values for a categorical explanatory variable.

freq.min

Minimum frequency for a dummy raster layer to be created for the corresponding factor level. See also freq.

most.frequent

Number of dummy raster layers to be created (if larger than 0), corresponding to the same number of most frequent factor levels See also freq.

overwrite

overwrite an existing file name with the same name (if TRUE). See also writeRaster.

...

additional arguments for writeRaster or (for ensemble.dummy.variables, writeRaster).

Value

The basic function ensemble.raster mainly results in the creation of raster layers that correspond to dummy variables.

Details

The basic function ensemble.dummy.variables creates dummy variables from a RasterLayer object (see raster) that represents a categorical variable. With freq.min and most.frequent it is possible to limit the number of dummy variables that will be created. For example, most.frequent = 5 results in five dummy variables to be created.

Function ensemble.accepted.categories modifies the RasterLayer object (see raster) by replacing cell values for categories (levels) that are not accepted with missing values.

Function ensemble.simplified.categories modifies the RasterLayer object (see raster) by replacing cell values for categories (levels) where none of the presence points occur with the same level. This new level is coded by the maximum coding level for these 'outside categories'.

Examples

Run this code

## Not run: 
# 
# # get predictor variables
# library(dismo)
# predictor.files <- list.files(path=paste(system.file(package="dismo"), '/ex', sep=''),
#     pattern='grd', full.names=TRUE)
# predictors <- stack(predictor.files)
# biome.layer <- predictors[["biome"]]
# biome.layer
# 
# # create dummy layers for the 5 most frequent factor levels
# 
# ensemble.dummy.variables(xcat=biome.layer, most.frequent=5,
#     overwrite=TRUE)
# 
# # check whether dummy variables were created
# predictor.files <- list.files(path=paste(system.file(package="dismo"), '/ex', sep=''),
#     pattern='grd', full.names=TRUE)
# predictors <- stack(predictor.files)
# predictors
# names(predictors)
# 
# # once dummy variables were created, avoid using the original categorical data layer
# predictors <- subset(predictors, subset=c("bio5", "bio6", "bio16", "bio17", 
#     "biome_1", "biome_2", "biome_7", "biome_8", "biome_13"))
# predictors
# predictors@title <- "base"
# 
# # presence points
# presence_file <- paste(system.file(package="dismo"), '/ex/bradypus.csv', sep='')
# pres <- read.table(presence_file, header=TRUE, sep=',')[,-1]
# 
# # the kfold function randomly assigns data to groups; 
# # groups are used as calibration (1/5) and training (4/5) data
# groupp <- kfold(pres, 5)
# pres_train <- pres[groupp !=  1, ]
# pres_test <- pres[groupp ==  1, ]
# 
# # choose background points
# ext <- extent(-90, -32, -33, 23)
# background <- randomPoints(predictors, n=1000, ext=ext, extf=1.00)
# colnames(background)=c('lon', 'lat')
# groupa <- kfold(background, 5)
# backg_train <- background[groupa != 1, ]
# backg_test <- background[groupa == 1, ]
# 
# # fit four ensemble models (RF, GLM, BIOCLIM, DOMAIN)
# # note that dummy variables are not used for BIOCLIM and DOMAIN
# # (neither are categorical variables)
# ensemble.nofactors <- ensemble.test(x=predictors, p=pres_train, a=backg_train, 
#     pt=pres_test, at=backg_test,
#     species.name="Bradypus",
#     VIF=T,
#     MAXENT=1, GBM=1, GBMSTEP=1, RF=1, GLM=1, GLMSTEP=1, GAM=1, 
#     GAMSTEP=1, MGCV=1, MGCVFIX=1,EARTH=1, RPART=1, NNET=1, FDA=1, 
#     SVM=1, SVME=1, BIOCLIM=1, DOMAIN=1, MAHAL=0,
#     Yweights="BIOMOD", 
#     dummy.vars=c("biome_1", "biome_2", "biome_7", "biome_8", "biome_13"),
#     PLOTS=FALSE, evaluations.keep=TRUE)
# 
# ## End(Not run)