This function creates spatially separated folds based on a pre-specified distance. It assigns blocks to the training and
testing folds randomly, systematically or in a checkerboard pattern. The distance (theRange
)
should be in metres, regardless of the unit of the reference system of
the input data (for more information see the details section). By default,
the function creates blocks according to the extent and shape of the study area, assuming that the user has considered the
landscape for the given species and case study. Alternatively, blocks can solely be created based on species spatial data.
Blocks can also be offset so the origin is not at the outer
corner of the rasters. Instead of providing a distance, the blocks can also be created by specifying a number of rows and/or
columns and divide the study area into vertical or horizontal bins, as presented in Wenger & Olden (2012) and Bahn & McGill (2012).
Finally, the blocks can be specified by a user-defined spatial polygon layer.
By default blockCV::spatialBlock()
does not allow the creation of multiple
repetitions. mlr3spatiotempcv
adds support for this when using the range
argument for fold creation. When supplying a vector of length(repeats)
for
argument range
, these different settings will be used to create folds which
differ among the repetitions.
Multiple repetitions are not possible when using the "row & cols" approach because the created folds will always be the same.
The 'Description' and 'Details' fields are inherited from the respective upstream function.
For a list of available arguments, please see blockCV::spatialBlock.
mlr3::Resampling
-> ResamplingRepeatedSpCVBlock
blocks
sf | list of sf objects
Polygons (sf
objects) as returned by blockCV which grouped
observations into partitions.
iters
integer(1)
Returns the number of resampling iterations, depending on the
values stored in the param_set
.
new()
Create an "spatial block" repeated resampling instance.
For a list of available arguments, please see blockCV::spatialBlock.
ResamplingRepeatedSpCVBlock$new(id = "repeated_spcv_block")
id
character(1)
Identifier for the resampling strategy.
folds()
Translates iteration numbers to fold number.
ResamplingRepeatedSpCVBlock$folds(iters)
iters
integer()
Iteration number.
repeats()
Translates iteration numbers to repetition number.
ResamplingRepeatedSpCVBlock$repeats(iters)
iters
integer()
Iteration number.
instantiate()
Materializes fixed training and test splits for a given task.
ResamplingRepeatedSpCVBlock$instantiate(task)
task
Task A task to instantiate.
clone()
The objects of this class are cloneable with this method.
ResamplingRepeatedSpCVBlock$clone(deep = FALSE)
deep
Whether to make a deep clone.
To keep the consistency, all the functions use metres as their unit. In this function, when the input map
has geographic coordinate system (decimal degrees), the block size is calculated based on deviding theRange
by
111325 (the standard distance of a degree in metres, on the Equator) to change the unit to degree. This value is optional
and can be changed by user via degMetre
argument.
The xOffset
and yOffset
can be used to change the spatial position of the blocks. It can also be used to
assess the sensitivity of analysis results to shifting in the blocking arrangements. These options are available when theRange
is defined. By default the region is located in the middle of the blocks and by setting the offsets, the blocks will shift.
Roberts et. al. (2017) suggest that blocks should be substantially bigger than the range of spatial
autocorrelation (in model residual) to obtain realistic error estimates, while a buffer with the size of
the spatial autocorrelation range would result in a good estimation of error. This is because of the so-called
edge effect (O'Sullivan & Unwin, 2014), whereby points located on the edges of the blocks of opposite sets are
not separated spatially. Blocking with a buffering strategy overcomes this issue (see buffering
).
Valavi R, Elith J, Lahoz-Monfort JJ, Guillera-Arroita G (2018). “blockCV: an R package for generating spatially or environmentally separated folds for k-fold cross-validation of species distribution models.” bioRxiv. 10.1101/357798.
# NOT RUN {
if (mlr3misc::require_namespaces(c("sf", "blockCV"), quietly = TRUE)) {
library(mlr3)
task = tsk("diplodia")
# Instantiate Resampling
rrcv = rsmp("repeated_spcv_block",
folds = 3, repeats = 2,
range = c(5000L, 10000L))
rrcv$instantiate(task)
# Individual sets:
rrcv$iters
rrcv$folds(1:6)
rrcv$repeats(1:6)
# Individual sets:
rrcv$train_set(1)
rrcv$test_set(1)
intersect(rrcv$train_set(1), rrcv$test_set(1))
# Internal storage:
rrcv$instance # table
}
# }
Run the code above in your browser using DataLab