evals: Evaluate and Process R Code

Description

This function takes either a vector/list of strings with actual R code, which it to be parsed to separate elements. Each list element is evaluated in a special environment, and a detailed list of results is returned for each logical part of the R code: a character value with R code, resulting R object, printed output, class of resulting R object, possible informative/warning/error messages and anything written to stdout. If a graph is plotted in the given text, the returned object is a string specifying the path to the saved file. Please see Details below. If parse option set to FALSE, then the returned list's length equals to the length of the parsed input - as each string is evaluated as separate R code in the same environment. If a nested list of R code or a concatenated string (separated by \n or ;) is provided like list(c('runif(1)', 'runif(1)')) with parse=FALSE, then everything is evaled at one run so the length of returned list equals to one or the length of the provided nested list. See examples below.

Usage

evals(txt, parse = TRUE, cache = TRUE,
    cache.mode = c("environment", "disk"),
    cache.dir = ".cache", cache.time = 0.1,
    cache.copy.images = FALSE, showInvisible = FALSE,
    classes = NULL, hooks = NULL, length = Inf,
    output = c("all", "src", "result", "output", "type", "msg", "stdout"),
    env = NULL, graph.unify = evalsOptions("graph.unify"),
    graph.name = "%t", graph.dir = "plots",
    graph.output = c("png", "bmp", "jpeg", "jpg", "tiff", "svg", "pdf"),
    width = 480, height = 480, res = 72, hi.res = FALSE,
    hi.res.width = 960,
    hi.res.height = 960 * (height/width),
    hi.res.res = res * (hi.res.width/width),
    graph.env = FALSE, graph.recordplot = FALSE,
    graph.RDS = FALSE, ...)

Arguments

txt

a character vector containing R code. This could be a list/vector of lines of code or a simple string holding R code separated by ; or \n.

parse

if TRUE the provided txt elements would be merged into one string and parsed to logical chunks. This is useful if you would want to get separate results of your code parts - not just the last returned value, but you are p

cache

caching the result of R calls if set to TRUE. Please note the caching would not work if parse set to FALSE or syntax error is to be found.

cache.mode

cached results could be stored in an environment in current R session or let it be permanent on disk.

cache.dir

path to a directory holding cache files if cache.mode set to disk. Default to .cache in current working directory.

cache.time

number of seconds to limit caching based on proc.time. If set to 0, all R commands, if set to Inf, none is cached (despite the cache parameter).

cache.copy.images

copy images to new file names if an image is returned from the disk cache? If set to FALSE (default), the cached path would be returned.

showInvisible

return invisible results?

classes

a vector or list of classes which should be returned. If set to NULL (by default) all R objects will be returned.

hooks

list of hooks to be run for given classes in the form of list(class = fn). If you would also specify some parameters of the function, a list should be provided in the form of

list(fn, param1,
  param2=NULL)

etc. So the hook

length

any R object exceeding the specified length will not be returned. The default value (Inf) does not filter out any R objects.

output

a character vector of required returned values. This might be useful if you are only interested in the result, and do not want to save/see e.g. messages or printed output. See examples below.

env

environment where evaluation takes place. If not set (by default), a new temporary environment is created.

graph.unify

should evals try to unify the style of (base, lattice and ggplot2) plots? If set to TRUE, some panderOptions() would apply. By default this is disabled not to freak out

graph.name

set the file name of saved plots which is tempfile by default. A simple character string might be provided where %d would be replaced by the index of the generating txt so

graph.dir

path to a directory where to place generated images. If the directory does not exist, evals try to create that. Default set to plots in current working directory.

graph.output

set the required file format of saved plots. Currently it could be any of grDevices': png, bmp, jpeg, jpg, tiff, svg or pdf.

width

width of generated plot in pixels for even vector formats

height

height of generated plot in pixels for even vector formats

res

nominal resolution in ppi. The height and width of vector images will be calculated based in this.

hi.res

generate high resolution plots also? If set to TRUE, each R code parts resulting an image would be run twice.

hi.res.width

width of generated high resolution plot in pixels for even vector formats

hi.res.height

height of generated high resolution plot in pixels for even vector formats. This value can be left blank to be automatically calculated to match original plot aspect ratio.

hi.res.res

nominal resolution of high resolution plot in ppi. The height and width of vector plots will be calculated based in this. This value can be left blank to be automatically calculated to fit original plot scales.

graph.env

save the environments in which plots were generated to distinct files (based on graph.name) with env extension?

graph.recordplot

save the plot via recordPlot to distinct files (based on graph.name) with recodplot extension?

graph.RDS

save the raw R object returned (usually with lattice or ggplot2) while generating the plots to distinct files (based on graph.name) with RDS extension?

...

optional parameters passed to graphics device (e.g. bg, pointsize etc.)

Value

a list of parsed elements each containing: src (the command run), result (R object: NULL if nothing returned, path to image file if a plot was generated), printed output, type (class of returned object if any), informative/wawrning and error messages (if any returned by the command run, otherwise set to NULL) and possible stdoutt value. See Details above.

Details

As evals tries to grab the plots internally, pleas do not run commands that set graphic device or dev.off. E.g. running

evals(c('png("/tmp/x.png")', 'plot(1:10)',
  'dev.off()'))

would fail. printing of lattice and ggplot2 objects is not needed, evals would deal with that automatically.

The generated image file(s) of the plots can be fine-tuned by some specific options, please check out graph.output, width, height, res, hi.res, hi.res.width, hi.res.height and hi.res.res parameters. Most of these options are better not to touch, see details of parameters below.

Returned result values: list with the following elements

src- character vector of specified R code.
result- result of evaluation.NULLif nothing is returned. If any R code returned an R object while evaluating then thelastR object will be returned as a raw R object. If a graph is plotted in the given text, the returned object is a string (withclassset toimage) specifying the path to the saved image file. If graphic device was touched, then no other R objects will be returned.
output- character vector of printed version (capture.output) ofresult
type- class of generated output. "NULL" if nothing is returned, "error" if some error occurred.
msg- possible messages grabbed while evaluating specified R code with the following structure:
- messages- character vector of possible diagnostic message(s)
- warnings- character vector of possible warning message(s)
- errors- character vector of possible error message(s)
stdout- character vector of possibly printed texts to standard output (console)

By default evals tries to cache results. This means that if evaluation of some R commands take too much time (specified in cache.time parameter), then evals would save the results in a file and return from there on next exact R code's evaluation. This caching algorithm tries to be smart as checks not only the passed R sources, but all variables inside that and saves the hash of those.

Technical details of the caching algorithm:

Each passed R chunk isparsed to single commands.
Each parsed command's part (let it be a function, variable, constant etc.)evaluated (as aname) separately to alist. This list describes the unique structure and the content of the passed R commands, and has some IMHO really great benefits (see examples below).
A hash if computed to each list element and cached too inpander's local environments. This is useful if you are using large data frames, just imagine: the caching algorithm would have to compute the hash for the same data frame each time it's touched! This way the hash is recomputed only if the R object with the given name is changed.
The list isserialized and anSHA-1hash is computed for that - which is unique and there is no real risk of collision.
Ifevalscan find the cached results in a file named to the computed hash, then it is returned on the spot.
Otherwise the call is evaluated and the results are optionally saved to cache (e.g. ifcacheis active, if theproc.time()of the evaluation is higher then it is defined incache.timeetc.).

This is a quite secure way of caching, but if you would encounter any issues, just set cache to FALSE or tweak other cache parameters. While setting cache.dir, please do think about what you are doing and move your graph.dir accordingly, as evals might result in returning an image file path which is not found any more on your file system!

Also, if you have generated a plot and rendered that to e.g. png before and later try to get e.g. pdf - it would fail with cache on. Similarly you cannot render a high resolution image of a cached image, but you have to (temporary) disable caching.

The default evals options could be set globally with evalsOptions, e.g. to switch off the cache just run evalsOptions('cache', FALSE).

Please check the examples carefully below to get a detailed overview of evals.

Examples

Run this code

# parsing several lines of R code
txt <- readLines(textConnection('x <- rnorm(100)
  runif(10)
  warning("Lorem ipsum foo-bar-foo!")
  plot(1:10)
  qplot(rating, data = movies, geom = "histogram")
  y <- round(runif(100))
  cor.test(x, y)
  crl <- cor.test(runif(10), runif(10))
  table(mtcars$am, mtcars$cyl)
  ggplot(mtcars) + geom_point(aes(x = hp, y = mpg))'))
evals(txt)

## parsing a list of commands
txt <- list('df <- mtcars',
 c('plot(mtcars$hp, pch = 19)','text(mtcars$hp, label = rownames(mtcars), pos = 4)'),
 'ggplot(mtcars) + geom_point(aes(x = hp, y = mpg))')
evals(txt)

## the same commands in one string but also evaluating the `plot` with `text` (note the leading "+" on the beginning of `text...` line)
txt <- 'df <- mtcars
 plot(mtcars$hp, pch = 19)
 +text(mtcars$hp, label = rownames(mtcars), pos = 4)
 ggplot(mtcars) + geom_point(aes(x = hp, y = mpg))'
evals(txt)
## but it would fail without parsing
evals(txt, parse = FALSE)

## handling messages
evals('message(20)')
evals('message(20);message(20)', parse = FALSE)

## adding a caption to a plot
evals('set.caption("FOO"); plot(1:10)')
## `plot` is started with a `+` to eval the codes in the same chunk (no extra chunk with NULL result)
evals('set.caption("FOO"); +plot(1:10)')

## handling warnings
evals('chisq.test(mtcars$gear, mtcars$hp)')
evals(list(c('chisq.test(mtcars$gear, mtcars$am)', 'pi', 'chisq.test(mtcars$gear, mtcars$hp)')), parse = F)
evals(c('chisq.test(mtcars$gear, mtcars$am)', 'pi', 'chisq.test(mtcars$gear, mtcars$hp)'))

## handling errors
evals('runiff(20)')
evals('Old MacDonald had a farm\\dots')
evals('## Some comment')
evals(c('runiff(20)', 'Old MacDonald had a farm?'))
evals(list(c('runiff(20)', 'Old MacDonald had a farm?')), parse = F)
evals(c('mean(1:10)', 'no.R.function()'))
evals(list(c('mean(1:10)', 'no.R.function()')), parse = F)
evals(c('no.R.object', 'no.R.function()', 'very.mixed.up(stuff)'))
evals(list(c('no.R.object', 'no.R.function()', 'very.mixed.up(stuff)')), parse = F)
evals(c('no.R.object', 'Old MacDonald had a farm\\dots', 'pi'))
evals('no.R.object;Old MacDonald had a farm\\dots;pi', parse = F)
evals(list(c('no.R.object', 'Old MacDonald had a farm\\dots', 'pi')), parse = F)

## graph options
evals('plot(1:10)')
evals('plot(1:10);plot(2:20)')
evals('plot(1:10)', graph.output = 'jpg')
evals('plot(1:10)', height = 800)
evals('plot(1:10)', height = 800, hi.res = T)
evals('plot(1:10)', graph.output = 'pdf', hi.res = T)
evals('plot(1:10)', res = 30)
evals('plot(1:10)', graph.name = 'myplot')
evals(list('plot(1:10)', 'plot(2:20)'), graph.name = 'myplots-%d')
evals('plot(1:10)', graph.env = TRUE)
evals('x <- runif(100);plot(x)', graph.env = TRUE)
evals(c('plot(1:10)', 'plot(2:20)'), graph.env = TRUE)
evals(c('x <- runif(100)', 'plot(x)','y <- runif(100)', 'plot(y)'), graph.env = TRUE)
evals(list(c('x <- runif(100)', 'plot(x)'), c('y <- runif(100)', 'plot(y)')), graph.env = TRUE, parse = F)
evals('plot(1:10)', graph.recordplot = TRUE)
## unprinted lattice plot
evals('histogram(mtcars$hp)', graph.recordplot = TRUE)

## caching
system.time(evals('plot(mtcars)'))
system.time(evals('plot(mtcars)'))                   # running again to see the speed-up :)
system.time(evals('plot(mtcars)', cache = FALSE))    # cache disabled

## caching mechanism does check what's inside a variable:
x <- mtcars
evals('plot(x)')
x <- cbind(mtcars, mtcars)
evals('plot(x)')
x <- mtcars
system.time(evals('plot(x)'))

## stress your CPU - only once!
evals('x <- sapply(rep(mtcars$hp, 1e3), mean)')      # run it again!

## play with cache
require(lattice)
evals('histogram(rep(mtcars$hp, 1e5))')
## nor run the below call - which would return the cached version of the above call :)
f <- histogram
g <- rep
A <- mtcars$hp
B <- 1e5
evals('f(g(A, B))')#'

## or switch off cache globally:
evalsOptions('cache', FALSE)
## and switch on later
evalsOptions('cache', TRUE)

## returning only a few classes
txt <- readLines(textConnection('rnorm(100)
  list(x = 10:1, y = "Godzilla!")
  c(1,2,3)
   matrix(0,3,5)'))
evals(txt, classes = 'numeric')
evals(txt, classes = c('numeric', 'list'))

## hooks
txt <- 'runif(1:4); matrix(runif(25), 5, 5); 1:5'
hooks <- list('numeric' = round, 'matrix' = pander.return)
evals(txt, hooks = hooks)
## using pander's default hook
evals(txt, hooks = list('default' = pander.return))
evals('22/7', hooks = list('numeric' = round))
evals('matrix(runif(25), 5, 5)', hooks = list('matrix' = round))

## setting default hook
evals(c('runif(10)', 'matrix(runif(9), 3, 3)'), hooks = list('default'=round))
## round all values except for matrices
evals(c('runif(10)', 'matrix(runif(9), 3, 3)'), hooks = list(matrix = 'print', 'default' = round))

# advanced hooks
hooks <- list('numeric' = list(round, 2), 'matrix' = list(round, 1))
evals(txt, hooks = hooks)

# return only returned values
evals(txt, output = 'result')

# return only messages (for checking syntax errors etc.)
evals(txt, output = 'msg')

# check the length of returned values and do not return looong R objects
evals('runif(10)', length = 5)

# note the following will not be filtered!
evals('matrix(1,1,1)', length = 1)

# if you do not want to let such things be eval-ed in the middle of a string use it with other filters :)
evals('matrix(1,1,1)', length = 1, classes = 'numeric')

# hooks & filtering
evals('matrix(5,5,5)', hooks = list('matrix' = pander.return), output = 'result')

# eval-ing chunks in given environment
myenv <- new.env()
evals('x <- c(0,10)', env = myenv)
evals('mean(x)', env = myenv)
rm(myenv)
# note: if you had not specified 'myenv', the second 'evals' would have failed
evals('x <- c(0,10)')
evals('mean(x)')