ProblemAlgorithm: Define Problems and Algorithms for Experiments

Description

Problems may consist of up to two parts: A static, immutable part (data in addProblem) and a dynamic, stochastic part (fun in addProblem). For example, for statistical learning problems a data frame would be the static problem part while a resampling function would be the stochastic part which creates problem instance. This instance is then typically passed to a learning algorithm like a wrapper around a statistical model (fun in addAlgorithm).

The functions serialize all components to the file system and register the respective problem or algorithm names in the ExperimentRegistry.

getProblemIds and getAlgorithmIds can be used to retrieve the IDs of already defined problems and algorithms, respectively.

Usage

addProblem(name, data = NULL, fun = NULL, seed = NULL,
  reg = getDefaultRegistry())
removeProblems(name, reg = getDefaultRegistry())
getProblemIds(reg = getDefaultRegistry())
addAlgorithm(name, fun = NULL, reg = getDefaultRegistry())
removeAlgorithms(name, reg = getDefaultRegistry())
getAlgorithmIds(reg = getDefaultRegistry())

Arguments

name

[character(1)] Unique identifier for the problem or algorithm.

data

[ANY] Static problem part. Default is NULL.

fun

[function] For addProblem, the function defining the stochastic problem part. The static part is passed to this function with name “data” and the Job/Experiment is passed as “job”. Therefore, your function must have the formal arguments “job” and “data” (or dots ...).

For addAlgorithm, the algorithm function. The static part is passed as “data”, the generated problem instance is passed as “instance” and the Job/Experiment as “job”. Therefore, your function must have the formal arguments “job”, “data” and “instance” (or dots ...).

If you do not provide a function, it defaults to a function which just returns the data part (Problem) or the instance (Algorithm).

seed

[integer(1)] Start seed for this problem. This allows the “synchronization” of a stochastic problem across algorithms, so that different algorithms are evaluated on the same stochastic instance. If the problem seed is defined, the seeding mechanism works as follows: (1) Before the dynamic part of a problem is instantiated, the seed of the problem + [replication number] - 1 is set, i.e. the first replication uses the problem seed. (2) The stochastic part of the problem is instantiated. (3) From now on the usual experiment seed of the registry is used, see ExperimentRegistry. If seed is set to NULL (default), the job seed is used to instantiate the problem and different algorithms see different stochastic instances of the same problem.

reg

[ExperimentRegistry] Registry. If not explicitly passed, uses the last created registry.

Value

[Problem]. Object of class “Problem” (invisibly).

Examples

Run this code

tmp = makeExperimentRegistry(file.dir = NA, make.default = FALSE)
addProblem("p1", fun = function(job, data) data, reg = tmp)
addProblem("p2", fun = function(job, data) job, reg = tmp)
getProblemIds(reg = tmp)

addAlgorithm("a1", fun = function(job, data, instance) instance, reg = tmp)
getAlgorithmIds(reg = tmp)

removeAlgorithms("a1", reg = tmp)
getAlgorithmIds(reg = tmp)

Run the code above in your browser using DataLab