bench v1.1.1

0

Monthly downloads

0th

Percentile

High Precision Timing of R Expressions

Tools to accurately benchmark and analyze execution times for R expressions.

Readme

bench

R build
status Coverage
status CRAN
status

The goal of bench is to benchmark code, tracking execution time, memory allocations and garbage collections.

Installation

You can install the release version from CRAN with:

install.packages("bench")

Or you can install the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("r-lib/bench")

Features

bench::mark() is used to benchmark one or a series of expressions, we feel it has a number of advantages over alternatives.

  • Always uses the highest precision APIs available for each operating system (often nanoseconds).
  • Tracks memory allocations for each expression.
  • Tracks the number and type of R garbage collections per expression iteration.
  • Verifies equality of expression results by default, to avoid accidentally benchmarking inequivalent code.
  • Has bench::press(), which allows you to easily perform and combine benchmarks across a large grid of values.
  • Uses adaptive stopping by default, running each expression for a set amount of time rather than for a specific number of iterations.
  • Expressions are run in batches and summary statistics are calculated after filtering out iterations with garbage collections. This allows you to isolate the performance and effects of garbage collection on running time (for more details see Neal 2014).

The times and memory usage are returned as custom objects which have human readable formatting for display (e.g. 104ns) and comparisons (e.g. x$mem_alloc > "10MB").

There is also full support for plotting with ggplot2 including custom scales and formatting.

Usage

bench::mark()

Benchmarks can be run with bench::mark(), which takes one or more expressions to benchmark against each other.

library(bench)
set.seed(42)
dat <- data.frame(x = runif(10000, 1, 1000), y=runif(10000, 1, 1000))

bench::mark() will throw an error if the results are not equivalent, so you don’t accidentally benchmark inequivalent code.

bench::mark(
  dat[dat$x > 500, ],
  dat[which(dat$x > 499), ],
  subset(dat, x > 500))
#> Error: Each result must equal the first result:
#> `dat[dat$x > 500, ]` does not equal `dat[which(dat$x > 499), ]`

Results are easy to interpret, with human readable units.

bnch <- bench::mark(
  dat[dat$x > 500, ],
  dat[which(dat$x > 500), ],
  subset(dat, x > 500))
bnch
#> # A tibble: 3 x 6
#>   expression                     min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 dat[dat$x > 500, ]           408µs    467µs     2079.     377KB     6.64
#> 2 dat[which(dat$x > 500), ]    284µs    354µs     2750.     260KB     7.00
#> 3 subset(dat, x > 500)         519µs    589µs     1654.     494KB     6.74

By default the summary uses absolute measures, however relative results can be obtained by using relative = TRUE in your call to bench::mark() or calling summary(relative = TRUE) on the results.

summary(bnch, relative = TRUE)
#> # A tibble: 3 x 6
#>   expression                  min median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                <dbl>  <dbl>     <dbl>     <dbl>    <dbl>
#> 1 dat[dat$x > 500, ]         1.44   1.32      1.26      1.45     1   
#> 2 dat[which(dat$x > 500), ]  1      1         1.66      1        1.05
#> 3 subset(dat, x > 500)       1.83   1.66      1         1.90     1.02

bench::press()

bench::press() is used to run benchmarks against a grid of parameters. Provide setup and benchmarking code as a single unnamed argument then define sets of values as named arguments. The full combination of values will be expanded and the benchmarks are then pressed together in the result. This allows you to benchmark a set of expressions across a wide variety of input sizes, perform replications and other useful tasks.

set.seed(42)

create_df <- function(rows, cols) {
  as.data.frame(setNames(
    replicate(cols, runif(rows, 1, 100), simplify = FALSE),
    rep_len(c("x", letters), cols)))
}

results <- bench::press(
  rows = c(1000, 10000),
  cols = c(2, 10),
  {
    dat <- create_df(rows, cols)
    bench::mark(
      min_iterations = 100,
      bracket = dat[dat$x > 500, ],
      which = dat[which(dat$x > 500), ],
      subset = subset(dat, x > 500)
    )
  }
)
#> Running with:
#>    rows  cols
#> 1  1000     2
#> 2 10000     2
#> 3  1000    10
#> 4 10000    10
results
#> # A tibble: 12 x 8
#>    expression  rows  cols      min   median `itr/sec` mem_alloc `gc/sec`
#>    <bch:expr> <dbl> <dbl> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#>  1 bracket     1000     2     35µs   40.6µs    23255.   15.84KB     14.0
#>  2 which       1000     2   34.2µs     38µs    25354.    7.91KB     15.2
#>  3 subset      1000     2   59.1µs   64.7µs    14979.    27.7KB     13.1
#>  4 bracket    10000     2   75.3µs   95.2µs    10183.  156.46KB     30.0
#>  5 which      10000     2   63.1µs   79.2µs    12247.   78.23KB     18.6
#>  6 subset     10000     2  140.1µs  197.4µs     4925.  273.79KB     25.0
#>  7 bracket     1000    10   80.1µs     91µs    10447.   47.52KB     17.7
#>  8 which       1000    10   69.5µs   83.7µs    11339.    7.91KB     21.9
#>  9 subset      1000    10    106µs    123µs     7418.   59.38KB     15.1
#> 10 bracket    10000    10  195.9µs  229.1µs     4123.   469.4KB     38.4
#> 11 which      10000    10  108.7µs  131.2µs     7059.   78.23KB     10.9
#> 12 subset     10000    10  302.6µs  344.1µs     2780.  586.73KB     33.4

Plotting

ggplot2::autoplot() can be used to generate an informative default plot. This plot is colored by gc level (0, 1, or 2) and faceted by parameters (if any). By default it generates a beeswarm plot, however you can also specify other plot types (jitter, ridge, boxplot, violin). See ?autoplot.bench_mark for full details.

ggplot2::autoplot(results)

You can also produce fully custom plots by un-nesting the results and working with the data directly.

system_time()

bench also includes system_time(), a higher precision alternative to system.time().

bench::system_time({ i <- 1; while(i < 1e7) i <- i + 1 })
#> process    real 
#>   296ms   296ms
bench::system_time(Sys.sleep(.5))
#> process    real 
#>    97µs   503ms

Alternatives

Functions in bench

Name Description
bench_bytes_trans Benchmark time transformation
hires_time Return the current high-resolution real time.
knit_print.bench_mark Custom printing function for bench_mark objects in knitr documents
scale_bench_time Position scales for bench_time data
scale_bench_expr Position and color scales for bench_expr data
workout Workout a group of expressions individually
summary.bench_mark Summarize bench::mark results.
mark Benchmark a series of functions
press Run setup code and benchmarks across a grid of parameters
bench_process_memory Retrieve the current and maximum memory from the R process
bench_memory Measure memory that an expression used.
as_bench_mark Coerce to a bench mark object Bench mark objects
bench_time Measure Process CPU and real time that an expression used.
as_bench_time Human readable times
bench_bytes Human readable memory sizes
bench_time_trans Benchmark time transformation
autoplot.bench_mark Autoplot method for bench_mark objects
bench-package bench: High Precision Timing of R Expressions
No Results!

Last month downloads

Details

License GPL-3
URL https://github.com/r-lib/bench
BugReports https://github.com/r-lib/bench/issues
Encoding UTF-8
LazyData true
RoxygenNote 7.0.2
NeedsCompilation yes
Packaged 2020-01-13 22:17:35 UTC; jhester
Repository CRAN
Date/Publication 2020-01-13 23:10:06 UTC

Include our badge in your README

[![Rdoc](http://www.rdocumentation.org/badges/version/bench)](http://www.rdocumentation.org/packages/bench)