Learn R Programming

SDMtune (version 1.1.0)

optimizeModel: Optimize Model

Description

The function uses a Genetic Algorithm implementation to optimize the model hyperparameter configuration according to the chosen metric.

Usage

optimizeModel(model, hypers, metric, test = NULL, pop = 20, gen = 5,
  env = NULL, parallel = FALSE, keep_best = 0.4, keep_random = 0.2,
  mutation_chance = 0.4, seed = NULL)

Arguments

model

'>SDMmodel or '>SDMmodelCV object.

hypers

named list containing the values of the hyperparameters that should be tuned, see details.

metric

character. The metric used to evaluate the models, possible values are: "auc", "tss" and "aicc".

test

'>SWD object. Test dataset used to evaluate the model, not used with aicc and '>SDMmodelCV objects, default is NULL.

pop

numeric. Size of the population, default is 5.

gen

numeric. Number of generations, default is 20.

env

stack containing the environmental variables, used only with "aicc", default is NULL.

parallel

logical, if TRUE it uses parallel computation, default is FALSE. Used only with metric = "aicc", see details.

keep_best

numeric. Percentage of the best models in the population to be retained during each iteration, expressed as decimal number. Default is 0.4.

keep_random

numeric. Probability of retaining the excluded models during each iteration, expressed as decimal number. Default is 0.2.

mutation_chance

numeric. Probability of mutation of the child models, expressed as decimal number. Default is 0.4.

seed

numeric. The value used to set the seed to have consistent results, default is NULL.

Value

'>SDMtune object.

Details

To know which hyperparameters can be tuned you can use the output of the function get_tunable_args. Hyperparameters not included in the hypers argument take the value that they have in the passed model.

  • Parallel computation is used only during the execution of the predict function, and increases the speed only for large datasets. For small dataset it may result in a longer execution, due to the time necessary to create the cluster.

  • Part of the code is inspired by this post.

See Also

gridSearch and randomSearch.

Examples

Run this code
# NOT RUN {
# Acquire environmental variables
files <- list.files(path = file.path(system.file(package = "dismo"), "ex"),
                    pattern = "grd", full.names = TRUE)
predictors <- raster::stack(files)

# Prepare presence and background locations
p_coords <- virtualSp$presence
bg_coords <- virtualSp$background

# Create SWD object
data <- prepareSWD(species = "Virtual species", p = p_coords, a = bg_coords,
                   env = predictors, categorical = "biome")

# Split presence locations in training (80%) and testing (20%) datasets
datasets <- trainValTest(data, test = 0.2, only_presence = TRUE)
train <- datasets[[1]]
test <- datasets[[2]]

# Train a model
model <- train(method = "Maxent", data = train, fc = "l")

# Define the hyperparameters to test
h <- list(reg = 1:3, fc = c("lqp", "lqph", "lh"), iter = seq(300, 700, 100))

# Run the function using as metric the AUC
output <- optimizeModel(model, hypers = h, metric = "auc", test = test,
                        seed = 25)
output@results
output@models
output@models[[1]]  # Best model
# }

Run the code above in your browser using DataLab