Learn R Programming

mlr3spatiotempcv (version 1.0.0)

ResamplingRepeatedSptCVCluto: (skmeans) Repeated spatiotemporal clustering resampling

Description

Spatiotemporal cluster partitioning via the vcluster executable of the CLUTO clustering application.

This partitioning method relies on the external CLUTO library. To use it, CLUTO's executables need to be downloaded and installed into this package.

See https://gist.github.com/pat-s/6430470cf817050e27d26c43c0e9be72 for an installation approach that should work on Windows and Linux. macOS is not supported by CLUTO.

Before using this method, please check the restrictive copyright shown below.

Arguments

Copyright

CLUTO's copyright is as follows:

The CLUTO package is copyrighted by the Regents of the University of Minnesota. It can be freely used for educational and research purposes by non-profit institutions and US government agencies only. Other organizations are allowed to use CLUTO only for evaluation purposes, and any further uses will require prior approval. The software may not be sold or redistributed without prior approval. One may make copies of the software for their use provided that the copies, are not sold or distributed, are used under the same terms and conditions. As unestablished research software, this code is provided on an <U+201C>as is<U+201D> basis without warranty of any kind, either expressed or implied. The downloading, or executing any part of this software constitutes an implicit agreement to these terms. These terms and conditions are subject to change at any time without prior notice.

Super class

mlr3::Resampling -> ResamplingRepeatedSptCVCluto

Public fields

time_var

character The name of the variable which represents the time dimension. Must be of type numeric.

clmethod

character Name of the clustering method to use within vcluster. See Details for more information.

cluto_parameters

character Additional parameters to pass to vcluster. Must be given as a single character string, e.g. "param1='value1'param2='value2'". See the CLUTO documentation for a full list of supported parameters.

verbose

logical Whether to show vcluster progress and summary output.

Active bindings

iters

integer(1) Returns the number of resampling iterations, depending on the values stored in the param_set.

Methods

Public methods

Method new()

Create an repeated resampling instance using the CLUTO algorithm.

Usage

ResamplingRepeatedSptCVCluto$new(
  id = "repeated_sptcv_cluto",
  time_var = NULL,
  clmethod = "direct",
  cluto_parameters = NULL,
  verbose = TRUE
)

Arguments

id

character(1) Identifier for the resampling strategy.

time_var

character The name of the variable which represents the time dimension. Must be of type numeric.

clmethod

character Name of the clustering method to use within vcluster. See Details for more information.

cluto_parameters

character Additional parameters to pass to vcluster. Must be given as a single character string, e.g. "param1='value1'param2='value2'". See the CLUTO documentation for a full list of supported parameters.

verbose

logical Whether to show vcluster progress and summary output.

Method folds()

Translates iteration numbers to fold number.

Usage

ResamplingRepeatedSptCVCluto$folds(iters)

Arguments

iters

integer() Iteration number.

Method repeats()

Translates iteration numbers to repetition number.

Usage

ResamplingRepeatedSptCVCluto$repeats(iters)

Arguments

iters

integer() Iteration number.

Method instantiate()

Materializes fixed training and test splits for a given task.

Usage

ResamplingRepeatedSptCVCluto$instantiate(task)

Arguments

task

Task A task to instantiate.

time_var

character The name of the variable which represents the time dimension. Must be of type numeric.

clmethod

character Name of the clustering method to use within vcluster. See Details for more information.

cluto_parameters

character Additional parameters to pass to vcluster. Must be given as a single character string, e.g. "param1='value1'param2='value2'". See the CLUTO documentation for a full list of supported parameters.

verbose

logical Whether to show vcluster progress and summary output.

Method clone()

The objects of this class are cloneable with this method.

Usage

ResamplingRepeatedSptCVCluto$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.

Details

By default, -clmethod='direct' is passed to the vcluster executable in contrast to the upstream default -clmethod='rb'. There is no evidence or research that this method is the best among the available ones ("rb", "rbr", "direct", "agglo", "graph", "bagglo"). Also, various other parameters can be set via argument cluto_parameters to achieve different clustering results.

Parameter -clusterfile is handled by skmeans and cannot be changed.

References

Zhao Y, Karypis G (2002). “Evaluation of Hierarchical Clustering Algorithms for Document Datasets.” 11th Conference of Information and Knowledge Management (CIKM), 51-524. http://glaros.dtc.umn.edu/gkhome/node/167.

Examples

Run this code
# NOT RUN {
if (mlr3misc::require_namespaces("skmeans", quietly = TRUE)) {
  library(mlr3)
  library(mlr3spatiotempcv)
  task = tsk("cookfarm")

  # Instantiate Resampling
  rrcv = rsmp("repeated_sptcv_cluto", folds = 3, repeats = 5)
  rrcv$instantiate(task, time_var = "Date")

  # Individual sets:
  rrcv$iters
  rrcv$folds(1:6)
  rrcv$repeats(1:6)

  # Individual sets:
  rrcv$train_set(1)
  rrcv$test_set(1)
  intersect(rrcv$train_set(1), rrcv$test_set(1))

  # Internal storage:
  rrcv$instance # table
}
# }

Run the code above in your browser using DataLab