Creates sdmSetting object that holds settings to fit and evaluate the models. It can be used to reproduce a study.
sdmSetting(formula,data,methods,interaction.depth=1,n=1,replication=NULL,cv.folds=NULL,
test.percent=NULL,bg=NULL,bg.n=NULL,var.importance=NULL,response.curve=TRUE,
var.selection=FALSE,ncore=1L,modelSettings=NULL,seed=NULL,parallelSettings=NULL,...)
an object of class sdmSettings
specify the structure of the model
sdm data object or data.frame including species and feature data
character, name of the algorithms
level of interactions between predictors
number of replicates (run)
replication method (e.g., 'subsampling', 'bootstrapping', 'cv')
number of folds if cv (cross-validation) is in the selected replication methods
test percentage if subsampling is in the selected replication methods
method to generate background
number of background records
logical, whether variable importance should be calculated
method to calculate variable importance
logical, whether variable selection should be considered
number of cores to parallelize processing
optional list; settings for modelling methods can be specified by users
default is NULL; either logical specify whether a seed for random number generator should be considered, or a numerical to specify the exact seed number
default is NULL; a list include settings items for parallel processing. The parallel setting items include ncore, method, type, hosts, doParallel, and fork; see details for more information.
additional arguments
Babak Naimi naimi.b@gmail.com
using sdmSetting, the feature types, interaction.depth and all settings of the model can be defined. This function generate a sdmSetting object that can be specifically helpful for reproducibility. The object can be shared by a user that may be used for other studies.
If a user aims to reproduce the same results for every time the code is running with the same data and settings, a seed number should be specified. Through the seed
argument, a user can specify NULL
, means a seed should not be set (if a random sampling is incorporated in the modelling procedure, for different runs the results would be different); TRUE
, means a seed should be set (the seed number is randomly selected and used everytime the same setting is incorporated); a number
, means the seed will be set to the number specified by the user.
For parallel processing, a list of items can be passed to parallelSettings
, include:
ncore
: defines the number of cores (it can also be specified outside of this list, but will be removed in future)
method
: defines the platform/set of functions to run the parallelisation. Currently, two options of 'parallel', and 'foreach' is implemented. default is 'parallel'
doParallel
: Optional, definition to register for a backend for parallel processing (currently when method='foreach'). It should be provided as an R expression.
cluster
: Optional, if a cluster is already created and started, it can be introduced through this item to be used as the parallel processing platform (currently when method='parallel')
hosts
: A list of addresses for the accessible hosts (remote clusters) to be registered and used in parallel processing (may not work appropriately as it is still under development!)
fork
: Logical, Available for non-windows operating system and specifies whether a fork solution should be used for the parallelisation. Default is TRUE.
Naimi, B., Araujo, M.B. (2016) sdm: a reproducible and extensible R platform for species distribution modelling, Ecography, DOI: 10.1111/ecog.01881
if (FALSE) {
file <- system.file("external/pa_df.csv", package="sdm")
df <- read.csv(file)
head(df)
d <- sdmData(sp~b15+NDVI,train=df)
# generate sdmSettings object:
s <- sdmSetting(sp~., methods=c('glm','gam','brt','svm','rf'),
replication='sub',test.percent=30,n=10,modelSettings=list(brt=list(n.trees=500)))
s
}
Run the code above in your browser using DataLab