lapply_kfold_species: Apply a function over the folds of a set of species

Description

lapply_kfold_species returns a list of lists where each element is the result of applying fun to all species or the provided subset of species for the specified folds.

Usage

lapply_kfold_species(fun, ..., species = NULL, fold_type = "disc", k =
  1:5)

Arguments

fun

function. The function to be applied to the occurrence records of each species. Parameters are the species name, a list with the occurrence and background training and test records and a fold number.

...

optional arguments to fun.

species

dataframe or character vector. Dataframe like returned by list_species or the names of the species. If NULL (default) then fun is applied for all species.

fold_type

character. Type of partitioning you want to use, default is "disc".

integer vector. Numbers of the folds you want to get data for, if you want all 5-folds pass use 1:5, which is the default.

Value

A list with one named entry for every species provided or for all species. Every list entry is a list with k as names and the result of fun as value.

Details

The parameters passed to fun are speciesname, data where data is a list with 4 elements (occurrence_training, occurrence_test, background_training and background_test) and a parameter fold which contains the fold number. The different fold_type are: "disc": 5-fold disc partitioning of occurrences with pairwise distance sampled and buffer filtered random background points, equivalent to calling kfold_occurrence_background with

occurrence_fold_type = "disc", k = 5, pwd_sample = TRUE,
  background_buffer = 200*1000

"grid_4" and "grid_9": 4-fold and 9-fold grid partitioning of occurrences with pairwise distance sampled and buffer filtered random background points, equivalent to calling kfold_occurrence_background with

occurrence_fold_type =
  "grid", k = 4, pwd_sample = TRUE, background_buffer = 200*1000

"random": 5-fold random partitioning of occurrences and random background points, equivalent to calling kfold_occurrence_background with

occurrence_fold_type =
  "random", k = 5, pwd_sample = FALSE, background_buffer = 0

"targetgroup": same way of partitioning as the "random" folds but instead of random background points, a random subset of all occurrences points was used creating a targetgroup background points set which has the same sampling bias as the entire dataset.

Examples

Run this code

## Not run: ------------------------------------
# plot_occurrences <- function(speciesname, data, fold) {
#    title <- paste0(speciesname, " (fold = ", fold, ")")
#    plot(data$occurrence_train[,c("longitude", "latitude")], pch=".",
#         col="blue", main = title)
#    points(data$occurrence_test[,c("longitude", "latitude")], pch=".",
#         col="red")
# }
# 
# # plot training (blue) and test (red) occurrences
# # of the first 2 folds for the first 10 species
# species <- list_species()
# lapply_kfold_species(plot_occurrences, species=species[1:5,],
#                      fold_type = "disc", k = 1:2)
## ---------------------------------------------

Run the code above in your browser using DataLab