BIOMOD_FormatingData: Initialise the datasets for usage in biomod2

Description

This function rearranges the user's input data to make sure they can be used within biomod2. The function allows to select pseudo-absences or background data in the case that true absences data are not available, or to add pseudo-asbence data to an existing set of absence (see details).

Usage

BIOMOD_FormatingData(resp.var,
                     expl.var,
                     resp.xy = NULL,
                     resp.name = NULL,
                     eval.resp.var = NULL,
                     eval.expl.var = NULL,
                     eval.resp.xy = NULL,
                     PA.nb.rep = 0,
                     PA.nb.absences = 1000,
                     PA.strategy = 'random',
                     PA.dist.min = 0,
                     PA.dist.max = NULL,
                     PA.sre.quant = 0.025,
                     PA.table = NULL,
                     na.rm = TRUE)

Arguments

resp.var

a vector, SpatialPointsDataFrame (or SpatialPoints if you work with only presences data) containing species data (a

expl.var

a matrix, data.frame, SpatialPointsDataFrame or RasterStack containing your explanatory variables

resp.xy

optional 2 columns matrix containing the X and Y coordinates of resp.var (only consider if resp.var is a vector) that will be used to build your models.

eval.resp.var

a vector, SpatialPointsDataFrame your species data (a single species) in binary format (ones for presences, zeros for true absences and NA for indeterminated ) that will be used to evalu

eval.expl.var

a matrix, data.frame, SpatialPointsDataFrame or RasterStack containing your explanatory variables

eval.resp.xy

opional 2 columns matrix containing the X and Y coordinates of resp.var (only consider if resp.var is a vector) that will be used to evaluate the modelswith independant data (or past data for instance).

resp.name

response variable name (character). The species name.

PA.nb.rep

number of required Pseudo Absences selection (if needed). 0 by Default.

PA.nb.absences

number of pseudo-absence selected for each repetition (when PA.nb.rep > 0) of the selection (true absences included)

PA.strategy

strategy for selecting the Pseudo Absences (must be random, sre, disk or user.defined)

PA.dist.min

minimal distance to presences for disk Pseudo Absences selection (in meters if the explanatory is a not projected raster (+proj=longlat) and in map units (typically also meters) when it is projected or when explanatory variables are store

PA.dist.max

maximal distance to presences for disk Pseudo Absences selection(in meters if the explanatory is a not projected raster (+proj=longlat) and in map units (typically also meters) when it is projected or when explanatory variables are stored

PA.sre.quant

quantile used for sre Pseudo Absences selection

PA.table

a matrix (or a data.frame) having as many rows than resp.var values. Each column correspund to a Pseudo-absences selection. It contains TRUE or FALSE indicating which values of resp.v

na.rm

locical, if TRUE, all points having one or several missing value for environmental data will be removed from analyse

Value

A 'data.formated.Biomod.object' for BIOMOD_Modeling. It is strongly advised to check whether this formated data corresponds to what was expected. A summary is easily printed by simply tipping the name of the object. A generic plot function is also available to display the different dataset in the geographic space.

item

Explanatory variables encoding
Evaluation Data
Pseudo Absences selection
strategy
disk: you may define a minimal (PA.dist.min), respectively a maximal (PA.dist.max) distance to presences points for selecting your pseudo absences candidates. That may be usefull if you don't want to select pseudo-absences too close to your presences (same niche and to avoid pseudo-replication), respectively too far from your presences (localised sampling startegy).
sre: Pseudo absences candidates have to be selected in condition that differs from a defined proportion (PA.sre.quant) of presences data. It forces pseudo absences to be selected outside of the broadly defined environemental conditions for the species. It means that a surface range envelop model (sre, similar the BIOCLIM) is first carried out (using the specified quantile) on the species of interest, and then the pseudo-absence data are extracted outside of this envelop. This particular case may lead to over optimistic models evaluations.
user.defined: In this case, pseudo absences selection should have been done in a previous step. This pseudo absences have to be reference into a well formated data.frame (e.g. PA.table argument)

code

PA.strategy

sQuote

ensemble modeled
cross-validation
background data
strategy
Background data
No Information
background data

pkg

biomod2

enumerate

background data

itemize

random: all cell of initial background are Pseudo absences candidates. The choice is made randomly given the number of pseudo-absence to selectPA.nb.absences.

Details

This function homogenises the initial data for making sure the modelling exercie will be completed with all the required data. It supports different kind of inputs. IMPORTANT: When the explanatory data are given in rasterLayer or rasterStack objects, biomod2 will be extract the variables onto the XY coordinates of the presence (and absence is any) vector. Be sure to give the XY coordinates (resp.xy) in the same projection system than the raster objects. Same for the evaluation data in the case some sort of independant (or past) data are available (eval.resp.xy). When the explanatory variables are given in SpatialPointsDataFrame, the same requirements are asked than for the raster objects. The XY coordinates must be given to make sure biomod2 can extract the explanatory variables onto the presence (absence) data When the explanatory variables are stored in a data.frame, make sure there are in the same order than the response variable. biomod2 will simply merge the datasets without considering the XY coordinates. When both presence and absence data are available, and there is enough absences: set sQuote{PA.nb.rep} to 0. No pseudo-absence will be extracted. When no true absences are given or when there are not numerous enough. It's advise to make several pseudo absences selections. That way the influence of the pseudo-absence selection could then be estimated later on. If the user do not want to run several repetition, make sure to select a relatively high number pseudo-absence. Make sure the number of pseudo-absence data is not higher than the maximum number of potential pseudo-absence (e.g. do not select 10,000 pseudo-absence when the rasterStack or data.frame do not contain more than 2000 pixels or rows).

Response variable encoding

{ BIOMOD_FormatingData concerns a single species at a time so resp.var must be a uni-dimentional object. Response variable must be a vector or a one column data.frame/matrix/SpatialPointsDataFrame ( SpatialPoints are also allowed if you work with only presences data) object. As most of biomod2 models need Presences AND Absences data, the response variable must contain some absences (if there are not, make sure to select pseudo-absence). In the input resp.var argument, the data should be coded in the following way :

Presences : 1

True Absesnces : 0 (if any) No Information : NA (if any, might latter be used for pseudo-absence) } If resp.var is a non-spatial object (vector, matrix/data.frame) and that some models requiring spatial data are being used (e.g. MAXENT) and/or pseudo absences spatialy dependent (i.e 'disk'), make sure to give the XY coordinates of the sites/rows (resp.xy).

Examples

Run this code

# species occurrences
DataSpecies <- read.csv(system.file("external/species/mammals_table.csv",
                                    package="biomod2"), row.names = 1)
head(DataSpecies)

# the name of studied species
myRespName <- 'GuloGulo'

# the presence/absences data for our species 
myResp <- as.numeric(DataSpecies[,myRespName])

# the XY coordinates of species data
myRespXY <- DataSpecies[,c("X_WGS84","Y_WGS84")]


# Environmental variables extracted from BIOCLIM (bio_3, bio_4, bio_7, bio_11 & bio_12)
myExpl = stack( system.file( "external/bioclim/current/bio3.grd", 
                     package="biomod2"),
                system.file( "external/bioclim/current/bio4.grd", 
                             package="biomod2"), 
                system.file( "external/bioclim/current/bio7.grd", 
                             package="biomod2"),  
                system.file( "external/bioclim/current/bio11.grd", 
                             package="biomod2"), 
                system.file( "external/bioclim/current/bio12.grd", 
                             package="biomod2"))
# 1. Formatting Data
myBiomodData <- BIOMOD_FormatingData(resp.var = myResp,
                                     expl.var = myExpl,
                                     resp.xy = myRespXY,
                                     resp.name = myRespName)
                                     
myBiomodData
plot(myBiomodData)

Run the code above in your browser using DataLab