Learn R Programming

BiodiversityR (version 2.7-2)

ensemble.dummy.variables: Suitability mapping based on ensembles of modelling algorithms: handling of categorical data

Description

The basic function ensemble.dummy.variables creates new raster layers representing dummy variables (coded 0 or 1) for all or the most frequent levels of a caterogical variable. Sometimes the creation of dummy variables is needed for proper handling of categorical data for some of the suitability modelling algorithms.

Usage

ensemble.dummy.variables(xcat=NULL, freq.min=50, most.frequent=5, overwrite=TRUE, ...)
ensemble.accepted.categories(xcat = NULL, categories = NULL, filename=NULL, overwrite=TRUE, ...)
ensemble.simplified.categories(xcat = NULL, p = NULL, filename=NULL, overwrite=TRUE, ...)

Arguments

xcat
RasterLayer object (raster) containing values for a categorical explanatory variable.
freq.min
Minimum frequency for a dummy raster layer to be created for the corresponding factor level. See also freq.
most.frequent
Number of dummy raster layers to be created (if larger than 0), corresponding to the same number of most frequent factor levels See also freq.
overwrite
overwrite an existing file name with the same name (if TRUE). See also writeRaster.
...
additional arguments for writeRaster or (for ensemble.dummy.variables, writeRaster).
categories
numeric vector providing the accepted levels of a categorical raster layer; expected to correspond to the levels encountered during calibration
filename
name for the output file. See also writeRaster.
p
presence points that will be used for calibrating the suitability models, typically available in 2-column (x, y) or (lon, lat) dataframe; see also prepareData and extract

Value

The basic function ensemble.raster mainly results in the creation of raster layers that correspond to dummy variables.

Details

The basic function ensemble.dummy.variables creates dummy variables from a RasterLayer object (see raster) that represents a categorical variable. With freq.min and most.frequent it is possible to limit the number of dummy variables that will be created. For example, most.frequent = 5 results in five dummy variables to be created.

Function ensemble.accepted.categories modifies the RasterLayer object (see raster) by replacing cell values for categories (levels) that are not accepted with missing values.

Function ensemble.simplified.categories modifies the RasterLayer object (see raster) by replacing cell values for categories (levels) where none of the presence points occur with the same level. This new level is coded by the maximum coding level for these 'outside categories'.

See Also

ensemble.test, ensemble.raster

Examples

Run this code
## Not run: 
# 
# # get predictor variables
# library(dismo)
# predictor.files <- list.files(path=paste(system.file(package="dismo"), '/ex', sep=''),
#     pattern='grd', full.names=TRUE)
# predictors <- stack(predictor.files)
# biome.layer <- predictors[["biome"]]
# biome.layer
# 
# # create dummy layers for the 5 most frequent factor levels
# 
# ensemble.dummy.variables(xcat=biome.layer, most.frequent=5,
#     overwrite=TRUE)
# 
# # check whether dummy variables were created
# predictor.files <- list.files(path=paste(system.file(package="dismo"), '/ex', sep=''),
#     pattern='grd', full.names=TRUE)
# predictors <- stack(predictor.files)
# predictors
# names(predictors)
# 
# # once dummy variables were created, avoid using the original categorical data layer
# predictors <- subset(predictors, subset=c("bio5", "bio6", "bio16", "bio17", 
#     "biome_1", "biome_2", "biome_7", "biome_8", "biome_13"))
# predictors
# predictors@title <- "base"
# 
# # presence points
# presence_file <- paste(system.file(package="dismo"), '/ex/bradypus.csv', sep='')
# pres <- read.table(presence_file, header=TRUE, sep=',')[,-1]
# 
# # the kfold function randomly assigns data to groups; 
# # groups are used as calibration (1/5) and training (4/5) data
# groupp <- kfold(pres, 5)
# pres_train <- pres[groupp !=  1, ]
# pres_test <- pres[groupp ==  1, ]
# 
# # choose background points
# ext <- extent(-90, -32, -33, 23)
# background <- randomPoints(predictors, n=1000, ext=ext, extf=1.00)
# colnames(background)=c('lon', 'lat')
# groupa <- kfold(background, 5)
# backg_train <- background[groupa != 1, ]
# backg_test <- background[groupa == 1, ]
# 
# # fit four ensemble models (RF, GLM, BIOCLIM, DOMAIN)
# # note that dummy variables are not used for BIOCLIM and DOMAIN
# # (neither are categorical variables)
# ensemble.nofactors <- ensemble.test(x=predictors, p=pres_train, a=backg_train, 
#     pt=pres_test, at=backg_test,
#     species.name="Bradypus",
#     VIF=T,
#     MAXENT=1, GBM=1, GBMSTEP=1, RF=1, GLM=1, GLMSTEP=1, GAM=1, 
#     GAMSTEP=1, MGCV=1, MGCVFIX=1,EARTH=1, RPART=1, NNET=1, FDA=1, 
#     SVM=1, SVME=1, BIOCLIM=1, DOMAIN=1, MAHAL=0,
#     Yweights="BIOMOD", 
#     dummy.vars=c("biome_1", "biome_2", "biome_7", "biome_8", "biome_13"),
#     PLOTS=FALSE, evaluations.keep=TRUE)
# 
# ## End(Not run)

Run the code above in your browser using DataLab