Learn R Programming

mlr3spatiotempcv (version 2.0.3)

mlr_resamplings_spcv_buffer: (blockCV) Spatial buffering resampling

Description

This function generates spatially separated train and test folds by considering buffers of the specified distance around each observation point. This approach is a form of leave-one-out cross-validation. Each fold is generated by excluding nearby observations around each testing point within the specified distance (ideally the range of spatial autocorrelation, see spatialAutoRange). In this method, the testing set never directly abuts a training presence or absence (0s and 1s i.e. the response class). For more information see the details section.

Arguments

mlr3spatiotempcv notes

The 'Description' and 'Details' fields are inherited from the respective upstream function. For a list of available arguments, please see blockCV::buffering.

Super class

mlr3::Resampling -> ResamplingSpCVBuffer

Active bindings

iters

integer(1)
Returns the number of resampling iterations, depending on the values stored in the param_set.

Methods

Inherited methods


Method new()

Create an "Environmental Block" resampling instance.

For a list of available arguments, please see blockCV::buffering().

Usage

ResamplingSpCVBuffer$new(id = "spcv_buffer")

Arguments

id

character(1)
Identifier for the resampling strategy.


Method instantiate()

Materializes fixed training and test splits for a given task.

Usage

ResamplingSpCVBuffer$instantiate(task)

Arguments

task

Task
A task to instantiate.


Method clone()

The objects of this class are cloneable with this method.

Usage

ResamplingSpCVBuffer$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.

Details

When working with presence-background (presence and pseudo-absence) data (specified by spDataType argument), only presence records are used for specifying the folds. Consider a target presence point. The buffer is defined around this target point, using the specified range (theRange). The testing fold comprises the target presence point and all background points within the buffer (this is the default. If addBG = FALSE the bacground points are ignored). Any non-target presence points inside the buffer are excluded. All points (presence and background) outside of buffer are used for the training set. The methods cycles through all the presence data, so the number of folds is equal to the number of presence points in the dataset.

For presence-absence data, folds are created based on all records, both presences and absences. As above, a target observation (presence or absence) forms a test point, all presence and absence points other than the target point within the buffer are ignored, and the training set comprises all presences and absences outside the buffer. Apart from the folds, the number of training-presence, training-absence, testing-presence and testing-absence records is stored and returned in the records table. If species = NULL (no column with 0s and 1s is defined), the procedure is like presence-absence data. All other types of data (continuous, count or multi-class responses) should be used like this.

References

Valavi R, Elith J, Lahoz-Monfort JJ, Guillera-Arroita G (2018). “blockCV: an R package for generating spatially or environmentally separated folds for k-fold cross-validation of species distribution models.” bioRxiv. tools:::Rd_expr_doi("10.1101/357798").

See Also

ResamplingSpCVDisc

Examples

Run this code
# \donttest{
if (mlr3misc::require_namespaces(c("sf", "blockCV"), quietly = TRUE)) {
  library(mlr3)
  task = tsk("ecuador")

  # Instantiate Resampling
  rcv = rsmp("spcv_buffer", theRange = 10000)
  rcv$instantiate(task)

  # Individual sets:
  rcv$train_set(1)
  rcv$test_set(1)
  intersect(rcv$train_set(1), rcv$test_set(1))

  # Internal storage:
  # rcv$instance
}
# }

Run the code above in your browser using DataLab