calc_nreps2: Determine sample sizes for a pair of algorithms on a problem instance

Description

Iteratively calculates the required sample sizes for two algorithms on a given problem instance, so that the standard error of the estimate of the difference (either simple or percent) in mean performance is controlled at a predefined level.

Usage

calc_nreps2(instance, algorithm1, algorithm2, se.max, dif, method = "param",
  nstart = 20, nmax = 200, seed = NULL, boot.R = 999,
  force.balanced = FALSE, save.to.file = FALSE, folder = "./nreps_files")

Arguments

instance

a list object containing the definitions of the problem instance. See Section Problems and Algorithms for details.

algorithm1

a list object containing the definitions of algorithm 1. See Section Problems and Algorithms for details.

algorithm2

a list object containing the definitions of algorithm 2. See Section Problems and Algorithms for details.

se.max

desired upper limit for the standard error of the estimated difference between the two algorithms. See Section Types of Differences for details.

dif

type of difference to be used. Accepts "perc" (for percent differences) or "simple" (for simple differences)

method

method to use for estimating the standard error. Accepts "param" (for parametric) or "boot" (for bootstrap)

nstart

initial number of algorithm runs for each algorithm. See Section Initial Number of Observations for details.

nmax

maximum total allowed sample size.

seed

seed for the random number generator

boot.R

number of bootstrap resamples

force.balanced

logical flag to force the use of balanced sampling for the algorithms on each instance

save.to.file

logical flag: should the results be saved to a file in the current working directory?

folder

directory to save files (if save.to.file == TRUE)

Value

a list object containing the following items:

x1j - vector of observed performance values for algorithm1
x2j - vector of observed performance values for algorithm2
phi.est - estimated value for the statistic of interest
se - standard error of the estimate
n1j - number of observations generated for algorithm 1
n2j - number of observations generated for algorithm 2
r.opt = n1j / n2j
seed - the seed used for the PRNG
dif - the type of difference used
method - the method used ("param" / "boot")

Instances and Algorithms

Parameters instance, algorithm1 and algorithm2 must each be a list of instance (algorithm) specifications, defined according to the instructions given below.

instance is a named list containing all relevant parameters that define the problem instance. This list must contain at least the field instance$FUN, with the name of the problem instance function, that is, a routine that calculates y = f(x). If the instance requires additional parameters, these must also be provided as named fields.

Similarly, algorithm1 and algorithm2 must each be a named list containing all relevant parameters that define the algorithm to be applied for solving the problem instance. In what follows we use algorithm to refer to both algorithm1 and algorithm2

algorithm must contain a algorithm$FUN field (the name of the function that calls the algorithm) and any other elements/parameters that algorithm$FUN requires (e.g., stop criteria, operator names and parameters, etc.).

The function defined by the routine algorithm$FUN must have the following structure: supposing that the list in algorithm has fields algorithm$FUN = myalgo and algorithm$par1 = "a", algorithm$par2 = 5, then:

         myalgo <- function(par1, par2, instance, ...){
               # do stuff
               # ...
               return(results)
         }

That is, it must be able to run if called as:

         # remove '$FUN' and '$alias' field from list of arguments
         # and include the problem definition as field 'instance'
         myargs          <- algorithm[names(algorithm) != "FUN"]
         myargs          <- myargs[names(myargs) != "alias"]
         myargs$instance <- instance
# call function
         do.call(algorithm$FUN,
                 args = myargs)

The algorithm$FUN routine must return a list containing (at least) the performance value of the final solution obtained, in a field named value (e.g., result$value) after a given run.

Initial Number of Observations

In the general case the initial number of observations / algorithm / instance (nstart) should be relatively high. For the parametric case we recommend ~20 if outliers are not expected, ~50 (at least) if that assumption cannot be made. For the bootstrap approach we recommend using at least 20. However, if some distributional assumptions can be made - particularly low skewness of the population of algorithm results on the test instances), then nstart can in principle be as small as 5 (if the output of the algorithm were known to be normal, it could be 1).

In general, higher sample sizes are the price to pay for abandoning distributional assumptions. Use lower values of nstart with caution.

Types of Differences

Parameter dif informs the type of difference in performance to be used for the estimation (mu1 and mu2 represent the mean performance of each algorithm on the problem instance):

If dif == "perc" it estimates (mu2 - mu1) / mu1.
If dif == "simple" it estimates mu2 - mu1.

References

F. Campelo, F. Takahashi: Sample size estimation for power and accuracy in the experimental comparison of algorithms (submitted, 2017).
P. Mathews. Sample size calculations: Practical methods for engineers and scientists. Mathews Malnar and Bailey, 2010.
A.C. Davison, D.V. Hinkley: Bootstrap methods and their application. Cambridge University Press (1997)
E.C. Fieller: Some problems in interval estimation. Journal of the Royal Statistical Society. Series B (Methodological) 16(2), 175<U+2013>185 (1954)
V. Franz: Ratios: A short guide to confidence limits and proper use (2007). https://arxiv.org/pdf/0710.2024v1.pdf
D.C. Montgomery, C.G. Runger: Applied Statistics and Probability for Engineers, 6th ed. Wiley (2013)

Examples

Run this code

# NOT RUN {
# Uses dummy algorithms and a dummy instance to illustrate the
# use of calc_nreps2
algorithm1 <- list(FUN = "dummyalgo", alias = "algo1",
                   distribution.fun = "rnorm",
                   distribution.pars = list(mean = 10, sd = 1))
algorithm2 <- list(FUN = "dummyalgo", alias = "algo2",
                   distribution.fun = "rnorm",
                   distribution.pars = list(mean = 20, sd = 4))
instance <- list(FUN = "dummyinstance")

# Theoretical results for an SE = 0.5 on the simple difference:
# phi = 10; n1 = 20; n2 = 80
# (using the parametric approach)
my.reps  <- calc_nreps2(instance, algorithm1, algorithm2,
                        se.max = 0.5, dif = "simple", seed = 1234)
cat("n1j   =", my.reps$n1j, "\nn2j   =", my.reps$n2j,
    "\nphi_j =", my.reps$phi.est, "\nse    =", my.reps$se)

# Forcing equal sample sizes:
my.reps  <- calc_nreps2(instance, algorithm1, algorithm2,
                        se.max = 0.5, dif = "simple", seed = 1234,
                        force.balanced = TRUE)
cat("n1j   =", my.reps$n1j, "\nn2j   =", my.reps$n2j,
    "\nphi_j =", my.reps$phi.est, "\nse    =", my.reps$se)

# }
# NOT RUN {
# Using the bootstrap approach
algorithm3 <- list(FUN = "dummyalgo", alias = "algo3",
                   distribution.fun = "rchisq",
                   distribution.pars = list(df = 2, ncp = 3))

my.reps  <- calc_nreps2(instance, algorithm1, algorithm3,
                        se.max = 0.05, dif = "perc",
                        method = "boot", seed = 1234,
                        nstart = 20)
cat("n1j   =", my.reps$n1j, "\nn2j   =", my.reps$n2j,
    "\nphi_j =", my.reps$phi.est, "\nse    =", my.reps$se)
# }
# NOT RUN {

# }
# NOT RUN {
# Example for a 21-city TSP instance using 2 configurations of SANN
algorithm1 <- list(FUN  = "my.SANN", alias = "algo1",
                   Temp = 2000, budget = 10000)
algorithm2 <- list(FUN  = "my.SANN", alias = "algo2",
                   Temp = 4000, budget = 10000)
instance <- list(FUN    = "TSP.dist",
                 mydist = datasets::eurodist)
my.reps  <- calc_nreps2(instance, algorithm1, algorithm2,
                        se.max = 0.01, dif = "perc",
                        method = "param", seed = 1234,
                        nstart = 20)
cat("n1j   =", my.reps$n1j, "\nn2j   =", my.reps$n2j,
    "\nphi_j =", my.reps$phi.est, "\nse    =", my.reps$se)
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab