drake_cache_log: Get a table that represents the state of the cache.

Description

This functionality is like make(..., cache_log_file = TRUE), but separated and more customizable. Hopefully, this functionality is a step toward better data versioning tools.

Usage

drake_cache_log(path = getwd(), search = TRUE,
  cache = drake::get_cache(path = path, search = search, verbose =
  verbose), verbose = drake::default_verbose(), jobs = 1,
  targets_only = FALSE)

Arguments

path

Root directory of the drake project, or if search is TRUE, either the project root or a subdirectory of the project. Ignored if a cache is supplied.

logical. If TRUE, search parent directories to find the nearest drake cache. Otherwise, look in the current working directory only. Ignored if a cache is supplied.

cache

drake cache. See new_cache(). If supplied, path and search are ignored.

verbose

logical or numeric, control printing to the console. Use pkgconfig to set the default value of verbose for your R session: for example, pkgconfig::set_config("drake::verbose" = 2).

0 or FALSE: print nothing.
1 or TRUE: print only targets to build.
2: also print checks and cache info.
3: also print any potentially missing items.
4: also print imports and writes to the cache.

jobs

number of jobs/workers for parallel processing

targets_only

logical, whether to output information only on the targets in your workflow plan data frame. If targets_only is FALSE, the output will include the hashes of both targets and imports.

Value

Data frame of the hash keys of the targets and imports in the cache

Details

A hash is a fingerprint of an object's value. Together, the hash keys of all your targets and imports represent the state of your project. Use drake_cache_log() to generate a data frame with the hash keys of all the targets and imports stored in your cache. This function is particularly useful if you are storing your drake project in a version control repository. The cache has a lot of tiny files, so you should not put it under version control. Instead, save the output of drake_cache_log() as a text file after each make(), and put the text file under version control. That way, you have a changelog of your project's results. See the examples below for details. Depending on your project's history, the targets may be different than the ones in your workflow plan data frame. Also, the keys depend on the short hash algorithm of your cache (default: default_short_hash_algo()).

Examples

Run this code

# NOT RUN {
test_with_dir("Quarantine side effects.", {
# Load drake's canonical example.
load_mtcars_example() # Get the code with drake_example()
# Run the project, build all the targets.
make(my_plan)
# Get a data frame of all the hash keys.
# If you want a changelog, be sure to do this after every make().
cache_log <- drake_cache_log()
head(cache_log)
# Save the hash log as a flat text file.
write.table(
  x = cache_log,
  file = "drake_cache.log",
  quote = FALSE,
  row.names = FALSE
)
# At this point, put drake_cache.log under version control
# (e.g. with 'git add drake_cache.log') alongside your code.
# Now, every time you run your project, your commit history
# of hash_lot.txt is a changelog of the project's results.
# It shows which targets and imports changed on every commit.
# It is extremely difficult to track your results this way
# by putting the raw '.drake/' cache itself under version control.
})
# }

Run the code above in your browser using DataLab