Spatiotemporal cluster partitioning via the vcluster
executable of the
CLUTO clustering application.
This partitioning method relies on the external CLUTO library. To use it, CLUTO's executables need to be downloaded and installed into this package.
See https://gist.github.com/pat-s/6430470cf817050e27d26c43c0e9be72 for an installation approach that should work on Windows and Linux. macOS is not supported by CLUTO.
Before using this method, please check the restrictive copyright shown below.
CLUTO's copyright is as follows:
The CLUTO package is copyrighted by the Regents of the University of Minnesota. It can be freely used for educational and research purposes by non-profit institutions and US government agencies only. Other organizations are allowed to use CLUTO only for evaluation purposes, and any further uses will require prior approval. The software may not be sold or redistributed without prior approval. One may make copies of the software for their use provided that the copies, are not sold or distributed, are used under the same terms and conditions. As unestablished research software, this code is provided on an <U+201C>as is<U+201D> basis without warranty of any kind, either expressed or implied. The downloading, or executing any part of this software constitutes an implicit agreement to these terms. These terms and conditions are subject to change at any time without prior notice.
mlr3::Resampling
-> ResamplingRepeatedSptCVCluto
time_var
character The name of the variable which represents the time dimension. Must be of type numeric.
clmethod
character
Name of the clustering method to use within vcluster
.
See Details for more information.
cluto_parameters
character
Additional parameters to pass to vcluster
.
Must be given as a single character string, e.g.
"param1='value1'param2='value2'"
.
See the CLUTO documentation for a full list of supported parameters.
verbose
logical
Whether to show vcluster
progress and summary output.
iters
integer(1)
Returns the number of resampling iterations, depending on the
values stored in the param_set
.
new()
Create an repeated resampling instance using the CLUTO algorithm.
ResamplingRepeatedSptCVCluto$new( id = "repeated_sptcv_cluto", time_var = NULL, clmethod = "direct", cluto_parameters = NULL, verbose = TRUE )
id
character(1)
Identifier for the resampling strategy.
time_var
character The name of the variable which represents the time dimension. Must be of type numeric.
clmethod
character
Name of the clustering method to use within vcluster
.
See Details for more information.
cluto_parameters
character
Additional parameters to pass to vcluster
.
Must be given as a single character string, e.g.
"param1='value1'param2='value2'"
.
See the CLUTO documentation for a full list of supported parameters.
verbose
logical
Whether to show vcluster
progress and summary output.
folds()
Translates iteration numbers to fold number.
ResamplingRepeatedSptCVCluto$folds(iters)
iters
integer()
Iteration number.
repeats()
Translates iteration numbers to repetition number.
ResamplingRepeatedSptCVCluto$repeats(iters)
iters
integer()
Iteration number.
instantiate()
Materializes fixed training and test splits for a given task.
ResamplingRepeatedSptCVCluto$instantiate(task)
task
Task A task to instantiate.
time_var
character The name of the variable which represents the time dimension. Must be of type numeric.
clmethod
character
Name of the clustering method to use within vcluster
.
See Details for more information.
cluto_parameters
character
Additional parameters to pass to vcluster
.
Must be given as a single character string, e.g.
"param1='value1'param2='value2'"
.
See the CLUTO documentation for a full list of supported parameters.
verbose
logical
Whether to show vcluster
progress and summary output.
clone()
The objects of this class are cloneable with this method.
ResamplingRepeatedSptCVCluto$clone(deep = FALSE)
deep
Whether to make a deep clone.
By default, -clmethod='direct'
is passed to the vcluster
executable in
contrast to the upstream default -clmethod='rb'
.
There is no evidence or research that this method is the best among the
available ones ("rb", "rbr", "direct", "agglo", "graph", "bagglo").
Also, various other parameters can be set via argument cluto_parameters
to
achieve different clustering results.
Parameter -clusterfile
is handled by skmeans and cannot be
changed.
Zhao Y, Karypis G (2002). “Evaluation of Hierarchical Clustering Algorithms for Document Datasets.” 11th Conference of Information and Knowledge Management (CIKM), 51-524. http://glaros.dtc.umn.edu/gkhome/node/167.
# NOT RUN {
if (mlr3misc::require_namespaces("skmeans", quietly = TRUE)) {
library(mlr3)
library(mlr3spatiotempcv)
task = tsk("cookfarm")
# Instantiate Resampling
rrcv = rsmp("repeated_sptcv_cluto", folds = 3, repeats = 5)
rrcv$instantiate(task, time_var = "Date")
# Individual sets:
rrcv$iters
rrcv$folds(1:6)
rrcv$repeats(1:6)
# Individual sets:
rrcv$train_set(1)
rrcv$test_set(1)
intersect(rrcv$train_set(1), rrcv$test_set(1))
# Internal storage:
rrcv$instance # table
}
# }
Run the code above in your browser using DataLab