run_experiment: Run a full experiment

Description

Design and run a full experiment - calculate the required number of instances, run the algorithms on each problem instance using the iterative approach based on optimal sample size ratios, and return the results of the experiment. This routine builds upon calc_instances() and calc_nreps2(), so refer to the documentation of these two functions for details.

Usage

run_experiment(Instance.list, Algorithm.list, power, d, sig.level = 0.05,
  alternative = "two.sided", test.type = "t.test", se.max, dif,
  method = "param", nstart = 20, nmax = 1000, seed = NULL,
  boot.R = 999, force.balanced = FALSE, ncpus = 1,
  save.partial.results = FALSE, folder = "./nreps_files")

Arguments

Instance.list

list object containing the definitions of the available instances. this list may (or may not) be exhausted in the experiment. To estimate the number of required instances, see calc_instances(). For more detail on the definition of each instance, see calc_nreps2().

Algorithm.list

list object containing the definitions of the algorithms to be compared. See calc_nreps2() for details.

power

(desired) test power. See calc_instances() for details. Any value equal to or greater than one will force the method to use all available instances in Instance.list.

minimally relevant effect size (MRES), expressed as a standardized effect size, i.e., "deviation from H0" / "standard deviation". See calc_instances() for details.

sig.level

significance level (alpha) for the experiment. See calc_instances() for details.

alternative

type of alternative hypothesis ("two.sided" or "one.sided"). See calc_instances() for details.

test.type

type of test ("t.test", "wilcoxon", "binomial"). See calc_instances() for details.

se.max

desired upper limit for the standard error of the estimated difference between the two algorithms on each instance. See calc_nreps2() for details.

dif

type of difference to be used on each instance. Accepts "perc" (for percent differences) or "simple" (for simple differences). See calc_nreps2() for details.

method

method to use for estimating the standard errors. Accepts "param" (for parametric) or "boot" (for bootstrap). See calc_nreps2() for details.

nstart

initial number of algorithm runs for each algorithm in each instance. See calc_nreps2() for details.

nmax

maximum total allowed sample size in each instance See calc_nreps2() for details.

seed

seed for the random number generator

boot.R

number of bootstrap resamples. See calc_nreps2() for details.

force.balanced

logical flag to force the use of balanced sampling for the algorithms on each instance

ncpus

number of cores to use

save.partial.results

logical flag: should individual instance results be saved to file?

folder

directory to save files (if save.to.file == TRUE)

Value

a list object containing the full input configuration plus the following fields:

data.raw - data frame containing all observations generated
data.summary - data frame summarizing the experiment.
N - number of instances sampled
N.star - number of instances required
instances.sampled - names of the instances sampled
Underpowered - flag: TRUE if N < N.star

Instance List

Parameter Instance.list must contain a list of instance objects, where each field is itself a list, as defined in Section _Instances and Algorithms of the documentation of _calc_nreps2(). In summary, each element of Instance.list is an instance, i.e., a named list containing all relevant parameters that define the problem instance. This list must contain at least the field instance$FUN, with the name of the problem instance function, that is, a routine that calculates y = f(x). If the instance requires additional parameters, these must also be provided as named fields. An additional field, "instance$alias", can be used to provide the instance with a unique identifier (e.g., when using an instance generator).

Algorithms

Parameter Algorithm.list must contain a list of instance objects, where each field is itself a list, as defined in Section Instances and Algorithms of the documentation of calc_nreps2(). In summary, each element of Algorithm.list is an algorithm, i.e., a named list containing all relevant parameters that define the algorithm.

An algorithm must contain a algorithm$FUN field (the name of the function that calls the algorithm) and any other elements/parameters that algorithm$FUN requires (e.g., stop criteria, operator names and parameters, etc.). An additional field, algorithm$alias, can be used to provide the algorithm with a unique identifier (e.g., when comparing two different configurations of the same algorithm).

The function defined by the routine algorithm$FUN must have the following structure: supposing that the list in algorithm has fields algorithm$FUN = myalgo and algorithm$par1 = "a", algorithm$par2 = 5, then:

         myalgo <- function(par1, par2, instance, ...){
               # do stuff
               # ...
               return(results)
         }

That is, it must be able to run if called as:

         # remove '$FUN' and '$alias' from list of arguments
         # and include the problem definition as field 'instance'
         myargs          <- algorithm[names(algorithm) != "FUN"]
         myargs          <- myargs[names(myargs) != "alias"]
         myargs$instance <- instance
# call function
         do.call(algorithm$FUN,
                 args = myargs)

The algorithm$FUN routine must return a list object containing (at least) the performance value of the final solution obtained after a given run, in a field named value (e.g., result$value) .

Initial Number of Observations

In the general case the initial number of observations / algorithm / instance (nstart) should be relatively high. For the parametric case we recommend ~15 if outliers are not expected, ~50 (at least) if that assumption cannot be made. For the bootstrap approach we recommend using at least 15 or 20. However, if some distributional assumptions can be made - particularly low skewness of the population of algorithm results on the test instances), then nstart can in principle be as small as 5 (if the output of the algorithm were known to be normal, it could be 1).

In general, higher sample sizes are the price to pay for abandoning distributional assumptions. Use lower values of nstart with caution.

Types of Differences

Parameter dif informs the type of difference in performance to be used for the estimation (mu1 and mu2 represent the mean performance of each algorithm on the problem instance):

If dif == "perc" it estimates (mu2 - mu1) / mu1.
If dif == "simple" it estimates mu2 - mu1.

Sample Sizes for Nonparametric Methods

If the parameter test.type is set to either Wilcoxon or Binomial, this routine approximates the number of instances using the ARE of these tests in relation to the paired t.test, using the formulas:

n.wilcox = n.ttest / 0.86 = 1.163 * n.ttest
n.binom = n.ttest / 0.637 = 1.570 * n.ttest

References

F. Campelo, F. Takahashi: Sample size estimation for power and accuracy in the experimental comparison of algorithms (submitted, 2017).
P. Mathews. Sample size calculations: Practical methods for engineers and scientists. Mathews Malnar and Bailey, 2010.
A.C. Davison, D.V. Hinkley: Bootstrap methods and their application. Cambridge University Press (1997)
E.C. Fieller: Some problems in interval estimation. Journal of the Royal Statistical Society. Series B (Methodological) 16(2), 175<U+2013>185 (1954)
V. Franz: Ratios: A short guide to confidence limits and proper use (2007). https://arxiv.org/pdf/0710.2024v1.pdf
D.C. Montgomery, C.G. Runger: Applied Statistics and Probability for Engineers, 6th ed. Wiley (2013)

Examples

Run this code

# NOT RUN {
# Example using dummy algorithms and instances. See ?dummyalgo for details.
# In this case all instances are the same, so we expect all cases to return
# a percent difference of approx. phi.j = 1.0 and sample sizes of
# approx. n1 = 31, n2 = 87
algorithm1 <- list(FUN = "dummyalgo", alias = "algo1",
                   distribution.fun = "rnorm",
                   distribution.pars = list(mean = 10, sd = 1))
algorithm2 <- list(FUN = "dummyalgo", alias = "algo2",
                   distribution.fun = "rnorm",
                   distribution.pars = list(mean = 20, sd = 4))
algolist <- list(algorithm1, algorithm2)
instlist <- vector(100, mode = "list")
for (i in 1:100) instlist[[i]] <- list(FUN = "dummyinstance",
                                       alias = paste0("Inst. ", i))

my.results <- run_experiment(Instance.list = instlist,
                             Algorithm.list = algolist,
                             power = 0.8,
                             d = 1,
                             sig.level = 0.01,
                             se.max = 0.05,
                             dif = "perc",
                             nmax   = 200,
                             seed   = 1234,
                             ncpus  = 1)

# Take a look at the summary table
my.results$data.summary

# To perform inference on the results:
t.test(my.results$data.summary$phi.j, conf.level = 0.95)

# Test assumption of normality (of the data)
shapiro.test(my.results$data.summary$phi.j)
# }

Run the code above in your browser using DataLab