sdmSetting: creating sdmSetting object

Description

Creates sdmSetting object that holds settings to fit and evaluate the models. It can be used to reproduce a study.

Usage

sdmSetting(formula,data,methods,interaction.depth=1,n=1,replication=NULL,cv.folds=NULL,
     test.percent=NULL,bg=NULL,bg.n=NULL,var.importance=NULL,response.curve=TRUE,
     var.selection=FALSE,ncore=1L,modelSettings=NULL,seed=NULL,parallelSettings=NULL,...)

Value

an object of class sdmSettings

Arguments

formula: specify the structure of the model
data: sdm data object or data.frame including species and feature data
methods: character, name of the algorithms
interaction.depth: level of interactions between predictors
n: number of replicates (run)
replication: replication method (e.g., 'subsampling', 'bootstrapping', 'cv')
cv.folds: number of folds if cv (cross-validation) is in the selected replication methods
test.percent: test percentage if subsampling is in the selected replication methods
bg: method to generate background
bg.n: number of background records
var.importance: logical, whether variable importance should be calculated
response.curve: method to calculate variable importance
var.selection: logical, whether variable selection should be considered
ncore: number of cores to parallelize processing
modelSettings: optional list; settings for modelling methods can be specified by users
seed: default is NULL; either logical specify whether a seed for random number generator should be considered, or a numerical to specify the exact seed number
parallelSettings: default is NULL; a list include settings items for parallel processing. The parallel setting items include ncore, method, type, hosts, doParallel, and fork; see details for more information.
...: additional arguments

Author

Babak Naimi naimi.b@gmail.com

https://www.r-gis.net/

https://www.biogeoinformatics.org

Details

using sdmSetting, the feature types, interaction.depth and all settings of the model can be defined. This function generate a sdmSetting object that can be specifically helpful for reproducibility. The object can be shared by a user that may be used for other studies.

If a user aims to reproduce the same results for every time the code is running with the same data and settings, a seed number should be specified. Through the seed argument, a user can specify NULL, means a seed should not be set (if a random sampling is incorporated in the modelling procedure, for different runs the results would be different); TRUE, means a seed should be set (the seed number is randomly selected and used everytime the same setting is incorporated); a number, means the seed will be set to the number specified by the user.

For parallel processing, a list of items can be passed to parallelSettings, include:

ncore: defines the number of cores (it can also be specified outside of this list, but will be removed in future)

method: defines the platform/set of functions to run the parallelisation. Currently, two options of 'parallel', and 'foreach' is implemented. default is 'parallel'

doParallel: Optional, definition to register for a backend for parallel processing (currently when method='foreach'). It should be provided as an R expression.

cluster: Optional, if a cluster is already created and started, it can be introduced through this item to be used as the parallel processing platform (currently when method='parallel')

hosts: A list of addresses for the accessible hosts (remote clusters) to be registered and used in parallel processing (may not work appropriately as it is still under development!)

fork: Logical, Available for non-windows operating system and specifies whether a fork solution should be used for the parallelisation. Default is TRUE.

References

Naimi, B., Araujo, M.B. (2016) sdm: a reproducible and extensible R platform for species distribution modelling, Ecography, DOI: 10.1111/ecog.01881

Examples

Run this code

if (FALSE) {
file <- system.file("external/pa_df.csv", package="sdm")

df <- read.csv(file)

head(df) 

d <- sdmData(sp~b15+NDVI,train=df)

# generate sdmSettings object:
s <- sdmSetting(sp~., methods=c('glm','gam','brt','svm','rf'),
          replication='sub',test.percent=30,n=10,modelSettings=list(brt=list(n.trees=500)))

s


}

Run the code above in your browser using DataLab