unitizerState: Tests and Session State

Description

While R generally adheres to a "functional" programming style, there are several aspects of session state that can affect the results of code evaluation (e.g. global environment, search path). unitizer provides functionality to increase test reproducibility by controlling session state so that it is the same every time a test is run. This functionality is turned off by default to comply with CRAN requirements. You can permanently enable the recommended state tracking level by adding options(unitizer.state='recommended') in your .Rprofile, although if you intend to do this be sure to read the “CRAN non-compliance” section.

Usage

state(par.env, search.path, options, working.directory, random.seed,
  namespaces)
in_pkg(package = NULL)

Arguments

par.env

NULL to use the special unitizer parent environment, or an environment to use as the parent environment, or the name of a package as a character string to use that packages' namespace as the parent environment, or a unitizerInPkg object as produced by in_pkg, assumes .GlobalEnv if unspecified

search.path

one of 0:2, uses the default value corresponding to getOption(unitizer.state), which is 0 in the default unitizer state of “off”.

options

same as search.path

working.directory

same as search.path

random.seed

same as search.path

namespaces

same as search.path

package

character(1L) or NULL; if NULL will tell unitize to attempt to identify if the test file is inside an R package folder structure and if so run tests in that package's namespace. This should work with R CMD check tests as well as in normal usage. If character will take the value to be the name of the package to use the namespace of as the parent environment. Note that in_pkg does not retrieve the environment, it just tells unitize to do so.

Value

for state a unitizerStateRaw object, for in_pkg a unitizerInPkg object, both of which are suitable as values for the state parameter for unitize or as values for the “unitizer.state” global option.

CRAN non-compliance

In the default state management mode, this package fully complies with CRAN policies. In order to implement advanced state management features we must lightly trace some base functions to alert unitizer each time the search path is changed by a test expression. The traced function behavior is completely unchanged other than for the side effect of notifying unitizer each time they are called. Additionally, the functions are only traced during unitize evaluation and are untraced on exit. Unfortunately this tracing is against CRAN policies, which is why it is disabled by default.

For more details see the reproducible tests vignette with: vignette(package='unitizer', 'unitizer_reproducible_tests')

Overview

You can control how unitizer manages state via the state argument to unitize or by setting the “unitizer.state” option. This help file discusses state management with unitizer, and also documents two functions that, in conjunction with unitize or unitize_dir allow you to control state management.

Note: most of what is written in this page about unitize applies equally to unitize_dir.

unitizer provides functionality to insulate test code from variability in the following. Note the “can be” wording because by default these elements of state are not managed:

Workspace / Parent Environment: all tests can be evaluated in environments that are children of a special environment that does not inherit from .GlobalEnv. This prevents objects that are lying around in your workspace from interfering with your tests.
Random Seed: can be set to a specific value at the beginning of each test file so that tests using random values get the same value at every test iteration. This only sets the seed at the beginning of each test file, so changes in order or number of functions that generate random numbers in your test file will affect subsequent tests. The advantage of doing this over just setting the seed directly in the test files is that unitizer tracks the value of the seed and will tell you the seed changed for any given test (e.g. because you added a test in the middle of the file that uses the random seed).
Working Directory: can be set to the tests directory inside the package directory if the test files appear to be inside the folder structure of a package. This mimics R CMD check behavior. If test files are not inside a package directory structure then can be set to the test files' directory.
Search Path: can be set to what you would typically find in a freshly loaded vanilla R session. This means any non default packages that are loaded when you run your tests are unloaded prior to running your tests. If you want to use the same libraries across multiple tests you can load them with the pre argument to unitize or unitize_dir.
Options: same as search path
Namespaces: same as search path; this option is only made available to support options since many namespaces set options onLoad, and as such it is necessary to unload and re-load them to ensure default options are set.

In the “recommended” state tracking mode, parent environment, random seed, working directory, and search path are all managed to level 2, which approximates what you would find in a fresh session (see "Custom Control" section below). For example, with the search path managed, each test file will start evaluation with the search path set to the tests folder of your package. All these settings are returned to their original values when unitizer exits.

You can modify what aspects of state are managed by using the state parameter to unitize. If you are satisfied with basic default settings you can just use the presets described in the next section. If you want more control you can use the return values of the state and in_pkg functions as the values for the state parameter for unitize.

State is reset after running each test file when running multiple test files with unitize_dir, which means state changes in one test file will not affect the next one.

State Presets

For convenience unitizer provides several state management presets that you can specify via the state parameter to unitize. The simplest method is to specify the preset name as a character value:

"recommended":
- Use special (non .GlobalEnv) parent environemnt
- Manage search path
- Manage random seed (and set it to be of type "Wichmann-Hill" for space considerations).
- Manage workign directory
- Leave namespace and options untouched
"safe" like recommended, but turns off tracking for search path in addition to namespaces and options. These settings, particularly the last two, are the most likely to cause compatibility problems.
"pristine" implements the highest level of state tracking and control
"basic" keeps all tracking, but at a less aggressive level; state is reset between each test file to the state before you started unitizeing so that no single test file affects another, but the state of your workspace, search path, etc. when you launch unitizer will affect all the tests (see the Custom Control) section.
"off" (default) state tracking is turned off

Custom Control

If you want to customize each aspect of state control you can pass a unitizerState object as the state argument. The simplest way to do this is by using the state constructor function. Look at the examples for how to do this.

For convenience unitize allows you to directly specify a parent environment if all you want to change is the parent evaluation environment but are otherwise satisfied with the defaults. You can even use the in_pkg function to tell unitizer to use the namespace associated with your current project, assuming it is an R package. See examples for details.

If you do chose to modify specific aspects of state control here is a guide to what the various parameter values for state do:

For par.env: any of the following:
- NULL to use the special unitizer parent environment as the parent environment; this environment has for parent the parent of .GlobalEnv, so any tests evaluated therein will not be affected by objects in .GlobalEnv see (vignette("unitizer_reproducible_state")).
- an environment to use as the parent evaluation environment
- the name of a package to use that package's namespace environment as the parent environment
- the return value of in_pkg; used primarily to autodetect what package namespace to use based on package directory structure
For all other slots, the settings are in 0:2 and mean:
- 0 turn off state tracking
- 1 track, but start with state as it was when unitize was called.
- 2 track and set state to what you would typically find in a clean R session, with the exception of random.seed, which is set to getOption("unitizer.seed") (of kind "Wichmann-Hill" as that seed is substantially smaller than the R default seed).

If you chose to use level 1 for the random seed you should consider picking a random seed type before you start unitizer that is small like "Wichman-Hill" as the seed will be recorded each time it changes.

Permanently Setting State Tracking

You can permanently change the default state by setting the “unitizer.state” option to the name of the state presets above or to a or to a state settings option object generated with state as described in the previous section.

Avoiding <code>.GlobalEnv</code>

For the most part avoiding .GlobalEnv leads to more robust and reproducible tests since the tests are not influenced by objects in the workspace that may well be changing from test to test. There are some potential issues when dealing with functions that expect .GlobalEnv to be on the search path. For example, setClass uses topenv to find a default environment to assign S4 classes to. Typically this will be the package environment, or .GlobalEnv. However, when you are in unitizer this becomes the next environment on the search path, which is typically locked, which will cause setClass to fail. For those types of functions you should specify them with an environment directly, e.g. setClass("test", slots=c(a="integer"), where=environment()).

Namespaces and Options

Options and namespace state management require the ability to fully unload any non-default packages and namespaces, and there are some packages that cannot be unloaded, or should not be unloaded (e.g. data.table). If you know the packages you typically load in your sessions can be unloaded, you can turn this functionality on by setting options(unitizer.state="pristine") either in your session, in your .Rprofile file, or using state="prisitine" in each call to unitize or unitize_dir. If you have packages that cannot be unloaded, but you still want to enable these features, see the "Search Path and Namespace State Options" section of unitizer.opts docs.

If you run unitizer with options and namespace tracking and you run into a namespace that cannot be unloaded, or should not be unloaded because it is listed in getOption("unitizer.namespace.keep"), unitizer will turn off options state tracking from that point onwards.

Additionally, note that warn and error options are always set to 1 and NULL respectively during test evaluation, irrespective of what option state tracking level you select.

Known Untracked State Elements

system time: tests involving functions such as date will inevitably fail
locale: is not tracked because it so specific to the system and so unlikely be be changed by user action; if you have tests that depend on locale be sure to set the locale via the pre argument to unitize, and also to reset it to the original value in post.

Examples

Run this code

# NOT RUN {
## In this examples we use `...` to denote other arguments to `unitize` that
## you should specify.  All examples here apply equally to `unitize_dir`

## Run with recommended state tracking settings
unitize(..., state="recommended")
## Manage as much of state as possible
unitize(..., state="pristine")

## No state management, but evaluate with custom env as parent env
my.env <- new.env()
unitize(..., state=my.env)
## use custom environment, and turn on search.path tracking
## here we must use the `state` function to construct a state object
unitize(..., state=state(par.env=my.env, search.path=2))

## Specify a namespace to run in by name
unitize(..., state="stats")
unitize(..., state=state(par.env="stats")) # equivalent to previous

## Let `unitizer` figure out the namespace from the test file location;
## assumes test file is inside package folder structure
unitize("mytests.R", state=in_pkg()) # assuming mytests.R is part of a pkg
unitize("mytests.R", state=in_pkg("mypkg")) # also works
# }

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

CRAN non-compliance

Overview

State Presets

Custom Control

Permanently Setting State Tracking

Avoiding <code>.GlobalEnv</code>

Namespaces and Options

Known Untracked State Elements

See Also

Examples