Learn R Programming

comparer

The goal of comparer is to make it easy to compare the results of different code chunks that are trying to do the same thing. The R package microbenchmark is great for comparing the speed of code, but there’s no way to compare their output to see which is more accurate.

Installation

You can install comparer from GitHub with:

# install.packages("devtools")
# devtools::install_github("CollinErickson/comparer")

mbc

One of the two main functions of this package is mbc, for “model benchmark compare.” It is designed to be similar to the package microbenchmark, allow for fast comparisons except including the output/accuracy of the code evaluated instead of just timing.

Suppose you want to see how the mean and median of a sample of 100 randomly generated data points from an exponential distribution compare. Then, as demonstrated below, you can use the function mbc, with the functions mean and median, and then input=rexp(100). The value of input will be stored as x, so mean(x) will find the mean of that data. It outputs the run times of each, and then the results from the five trials, where five is the default setting for times. The run times aren’t useful because they are all fast. For more precise timing (<0.01 seconds), you should use microbenchmark. The trials all have the same output since there is no randomness, the same data is used for each trial. The “Output summary” shows that the mean is near 1, while the median is near 0.6.

## basic example code
library(comparer)
#> Loading required package: GauPro
#> Loading required package: mixopt
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
#> Loading required package: ggplot2
#> Loading required package: splitfngr
#> Loading required package: numDeriv
#> Loading required package: rmarkdown
#> Loading required package: tidyr
#> Loading required package: reshape
#> 
#> Attaching package: 'reshape'
#> The following objects are masked from 'package:tidyr':
#> 
#>     expand, smiths
#> The following object is masked from 'package:dplyr':
#> 
#>     rename
#> Loading required package: plyr
#> ------------------------------------------------------------------------------
#> You have loaded plyr after dplyr - this is likely to cause problems.
#> If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
#> library(plyr); library(dplyr)
#> ------------------------------------------------------------------------------
#> 
#> Attaching package: 'plyr'
#> The following objects are masked from 'package:reshape':
#> 
#>     rename, round_any
#> The following objects are masked from 'package:dplyr':
#> 
#>     arrange, count, desc, failwith, id, mutate, rename, summarise,
#>     summarize
#> Loading required package: progress
mbc(mean(x), median(x), input=rexp(100))
#> Run times (sec)
#>    Function        Sort1        Sort2        Sort3        Sort4        Sort5
#> 1   mean(x) 5.006790e-06 5.960464e-06 7.152557e-06 8.106232e-06 5.602837e-05
#> 2 median(x) 2.098083e-05 2.312660e-05 2.479553e-05 4.196167e-05 8.988380e-05
#>           mean           sd neval
#> 1 1.645088e-05 2.215562e-05     5
#> 2 4.014969e-05 2.902476e-05     5
#> 
#> Output summary
#>        Func Stat     Sort1     Sort2     Sort3     Sort4     Sort5      mean sd
#> 1   mean(x)    1 1.0321470 1.0321470 1.0321470 1.0321470 1.0321470 1.0321470  0
#> 2 median(x)    1 0.8087696 0.8087696 0.8087696 0.8087696 0.8087696 0.8087696  0

To get the data to be generated for each trial, use the inputi argument to set a variable that the functions call. The arguments mean(x) and median(x) are captured as expressions. rexp(100) will be stored as x by default. You can see that the values are now different for each trial.

## Regenerate the data each time
mbc(mean(x), median(x), inputi=rexp(100))
#> Run times (sec)
#>    Function        Sort1        Sort2        Sort3        Sort4        Sort5
#> 1   mean(x) 5.960464e-06 5.960464e-06 5.960464e-06 6.914139e-06 1.692772e-05
#> 2 median(x) 2.288818e-05 2.503395e-05 2.503395e-05 2.980232e-05 4.911423e-05
#>           mean           sd neval
#> 1 8.344650e-06 4.815819e-06     5
#> 2 3.037453e-05 1.077721e-05     5
#> 
#> Output summary
#>        Func Stat        V1        V2        V3        V4        V5     mean
#> 1   mean(x)    1 0.9890381 0.9069863 0.8813966 1.2063718 1.0568761 1.008134
#> 2 median(x)    1 0.6836623 0.6488801 0.6404516 0.7901115 0.7825493 0.709131
#>          sd
#> 1 0.1307018
#> 2 0.0723598

The variable name, or multiple variables, can be set in inputi by using braces {} In the example below, values are set for a and b, which can then be called by the expressions to be compared.

mbc(mean(a+b), mean(a-b), inputi={a=rexp(100);b=runif(100)})
#> Run times (sec)
#>      Function        Sort1        Sort2        Sort3        Sort4        Sort5
#> 1 mean(a + b) 5.960464e-06 5.960464e-06 6.198883e-06 6.914139e-06 1.788139e-05
#> 2 mean(a - b) 5.960464e-06 5.960464e-06 6.198883e-06 8.106232e-06 1.001358e-05
#>           mean           sd neval
#> 1 8.583069e-06 5.212596e-06     5
#> 2 7.247925e-06 1.788934e-06     5
#> 
#> Output summary
#>          Func Stat        V1        V2        V3       V4        V5      mean
#> 1 mean(a + b)    1 1.4851116 1.5601898 1.3481168 1.600197 1.4810187 1.4949268
#> 2 mean(a - b)    1 0.5518472 0.5843345 0.4168324 0.628536 0.4586843 0.5280469
#>           sd
#> 1 0.09641584
#> 2 0.08805201

ffexp

The other main function of the package is ffexp, an abbreviation for full-factorial experiment. It will run a function using all possible combinations of input parameters given. It is useful for running experiments that take a long time to complete.

The first arguments given to ffexp$new should give the possible values for each input parameter. In the example below, a can be 1, 2, or 3, and b can “a”, “b”, or “c”. Then eval_func should be given that can operate on these parameters. For example, using eval_func = paste will paste together the value of a with the value of b.

f1 <- ffexp$new(
  a=1:3,
  b=c("a","b","c"),
  eval_func=paste
)

After creating the ffexp object, we can call f1$run_all to run eval_func on every combination of a and b.

f1$run_all()

Now to see the results in a clean format, look at f1$outcleandf.

f1$outcleandf
#>   a b  V1 runtime          start_time            end_time run_number
#> 1 1 a 1 a       0 2024-09-28 11:17:53 2024-09-28 11:17:53          1
#> 2 2 a 2 a       0 2024-09-28 11:17:53 2024-09-28 11:17:53          2
#> 3 3 a 3 a       0 2024-09-28 11:17:53 2024-09-28 11:17:53          3
#> 4 1 b 1 b       0 2024-09-28 11:17:53 2024-09-28 11:17:53          4
#> 5 2 b 2 b       0 2024-09-28 11:17:53 2024-09-28 11:17:53          5
#> 6 3 b 3 b       0 2024-09-28 11:17:53 2024-09-28 11:17:53          6
#> 7 1 c 1 c       0 2024-09-28 11:17:53 2024-09-28 11:17:53          7
#> 8 2 c 2 c       0 2024-09-28 11:17:53 2024-09-28 11:17:53          8
#> 9 3 c 3 c       0 2024-09-28 11:17:53 2024-09-28 11:17:53          9

hype: Hyperparameter Optimization

hype uses Bayesian optimization to find the best parameters/inputs for a function that is slow to evaluate. (If the function can be evaluated quickly, then you can use standard optimization methods.) A common use case is for hyperparameter tuning: when fitting a model that has multiple hyperparameters, you want to find the best values to set the hyperparameters to but can only evaluate a small number of settings since each is slow.

Copy Link

Version

Install

install.packages('comparer')

Monthly Downloads

302

Version

0.2.4

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Collin Erickson

Last Published

October 2nd, 2024

Functions in comparer (0.2.4)

R6_hype

Hyperparameter optimization
R6_par_integer

Parameter with uniform distribution over integer range for hyperparameter optimization
hype

Hyperparameter optimization
R6_par_ordered

R6 class for hyperparameter of discrete (factor) variable
R6_par_unif

R6 class for Uniform parameter
R6_par_unordered

R6 class for hyperparameter of discrete (factor) variable
R6_par_log10

R6 class for hyperparameter on log10 scale
R6_par_discretenum

R6 object for discrete numeric
R6_par_hype

Parameter for hyperparameter optimization
ffexp

Full factorial experiment
print.mbc

Print mbc class
par_discretenum

Parameter with uniform distribution for hyperparameter optimization
mbc

Model benchmark compare
par_integer

Parameter with uniform distribution over integer range for hyperparameter optimization
par_log10

Hyperparameter on log10 scale
par_ordered

Hyperparameter of discrete (factor) variable
par_unif

Uniform parameter
plot.mbc

Plot mbc class
par_unordered

Hyperparameter of discrete (factor) variable