Spatiotemporal cluster partitioning via the vcluster executable of the
CLUTO clustering application.
This partitioning method relies on the external CLUTO library. To use it, CLUTO's executables need to be downloaded and installed into this package.
See https://gist.github.com/pat-s/6430470cf817050e27d26c43c0e9be72 for an installation approach that should work on Windows and Linux. macOS is not supported by CLUTO.
Before using this method, please check the restrictive copyright shown below.
CLUTO's copyright is as follows:
The CLUTO package is copyrighted by the Regents of the University of Minnesota. It can be freely used for educational and research purposes by non-profit institutions and US government agencies only. Other organizations are allowed to use CLUTO only for evaluation purposes, and any further uses will require prior approval. The software may not be sold or redistributed without prior approval. One may make copies of the software for their use provided that the copies, are not sold or distributed, are used under the same terms and conditions. As unestablished research software, this code is provided on an <U+201C>as is<U+201D> basis without warranty of any kind, either expressed or implied. The downloading, or executing any part of this software constitutes an implicit agreement to these terms. These terms and conditions are subject to change at any time without prior notice.
mlr3::Resampling -> ResamplingRepeatedSptCVCluto
time_varcharacter The name of the variable which represents the time dimension. Must be of type numeric.
clmethodcharacter
Name of the clustering method to use within vcluster.
See Details for more information.
cluto_parameterscharacter
Additional parameters to pass to vcluster.
Must be given as a single character string, e.g.
"param1='value1'param2='value2'".
See the CLUTO documentation for a full list of supported parameters.
verboselogical
Whether to show vcluster progress and summary output.
itersinteger(1)
Returns the number of resampling iterations, depending on the
values stored in the param_set.
new()Create an repeated resampling instance using the CLUTO algorithm.
ResamplingRepeatedSptCVCluto$new( id = "repeated_sptcv_cluto", time_var = NULL, clmethod = "direct", cluto_parameters = NULL, verbose = TRUE )
idcharacter(1)
Identifier for the resampling strategy.
time_varcharacter The name of the variable which represents the time dimension. Must be of type numeric.
clmethodcharacter
Name of the clustering method to use within vcluster.
See Details for more information.
cluto_parameterscharacter
Additional parameters to pass to vcluster.
Must be given as a single character string, e.g.
"param1='value1'param2='value2'".
See the CLUTO documentation for a full list of supported parameters.
verboselogical
Whether to show vcluster progress and summary output.
folds()Translates iteration numbers to fold number.
ResamplingRepeatedSptCVCluto$folds(iters)
itersinteger()
Iteration number.
repeats()Translates iteration numbers to repetition number.
ResamplingRepeatedSptCVCluto$repeats(iters)
itersinteger()
Iteration number.
instantiate()Materializes fixed training and test splits for a given task.
ResamplingRepeatedSptCVCluto$instantiate(task)
taskTask A task to instantiate.
time_varcharacter The name of the variable which represents the time dimension. Must be of type numeric.
clmethodcharacter
Name of the clustering method to use within vcluster.
See Details for more information.
cluto_parameterscharacter
Additional parameters to pass to vcluster.
Must be given as a single character string, e.g.
"param1='value1'param2='value2'".
See the CLUTO documentation for a full list of supported parameters.
verboselogical
Whether to show vcluster progress and summary output.
clone()The objects of this class are cloneable with this method.
ResamplingRepeatedSptCVCluto$clone(deep = FALSE)
deepWhether to make a deep clone.
By default, -clmethod='direct' is passed to the vcluster executable in
contrast to the upstream default -clmethod='rb'.
There is no evidence or research that this method is the best among the
available ones ("rb", "rbr", "direct", "agglo", "graph", "bagglo").
Also, various other parameters can be set via argument cluto_parameters to
achieve different clustering results.
Parameter -clusterfile is handled by skmeans and cannot be
changed.
Zhao Y, Karypis G (2002). “Evaluation of Hierarchical Clustering Algorithms for Document Datasets.” 11th Conference of Information and Knowledge Management (CIKM), 51-524. http://glaros.dtc.umn.edu/gkhome/node/167.
# NOT RUN {
if (mlr3misc::require_namespaces("skmeans", quietly = TRUE)) {
library(mlr3)
library(mlr3spatiotempcv)
task = tsk("cookfarm")
# Instantiate Resampling
rrcv = rsmp("repeated_sptcv_cluto", folds = 3, repeats = 5)
rrcv$instantiate(task, time_var = "Date")
# Individual sets:
rrcv$iters
rrcv$folds(1:6)
rrcv$repeats(1:6)
# Individual sets:
rrcv$train_set(1)
rrcv$test_set(1)
intersect(rrcv$train_set(1), rrcv$test_set(1))
# Internal storage:
rrcv$instance # table
}
# }
Run the code above in your browser using DataLab