Learn R Programming

⚠️There's a newer version (2.1.2) of this package.Take me there.

reproducible

A set of tools for R that enhance reproducibility for data analytics and forecasting. This package aims at making high-level, robust, machine and OS independent tools for making deeply reproducible and reusable content in R.

News

See updates from latest CRAN and development versions. Note that versions 1.0.0 and later are not compatible with previous versions. The current version can be much faster and creates smaller repository files (each with specific options set using Suggests packages) and allows for different (e.g., RPostgres backends for the database -- not the saved files, however; these are still saved locally).

Reproducible workflows

A reproducible workflow is a series of code steps (e.g., in a script) that, when run, produce the same output from the same inputs every time. The big challenge with such a workflow is that many steps are so time consuming that a scientist tends to not re-run each step every time. After many months of work, it is often unclear if the code will actually function from the start. Is the original dataset still there? Have the packages that were used been updated? Are some of the steps missing because there was some "point and clicking"?

The best way to maintain reproducibility is to have all the code re-run all the time. That way, errors are detected early and can be fixed. The challenge is how to make all the steps fast enough that it becomes convenient to re-run everything from scratch each time.

Cache

Caching is the principle tool to achieve this reproducible work-flow. There are many existing tools that support some notion of caching. The main tool here, Cache, can be nested hierarchically, becoming very powerful for the data science developer who is regularly working at many levels of an analysis.

rnorm(1) # give a random number
Cache(rnorm, 1) # generates a random number
Cache(rnorm, 1) # recovers the previous random number because call is identical

prepInputs

A common data problem is starting from a raw (spatial) dataset and getting it into shape for an analysis. Often, copies of a dataset are haphazardly placed in ad hoc local file systems. This makes it particularly difficult to share the workflow. The solution to this is use a canonical location (e.g., cloud storage, permalink to original data provider, etc.) and use tools that are smart enough to download only once.

Get a geospatial dataset. It will be checksummed (locally), meaning if the file is already in place locally, it will not download it again.

# Using dlFun -- a custom download function -- passed to preProcess
test1 <- prepInputs(targetFile = "GADM_2.8_LUX_adm0.rds", # must specify currently
                    dlFun = "raster::getData", name = "GADM", country = "LUX", level = 0,
                    path = dPath)

Cache with prepInputs

Putting these tools together allows for very rich data flows. For example, with prepInputs and using the fun argument or passing a studyArea, a raw dataset can be downloaded, loaded into R, and post processed -- all potentially very time consuming steps resulting in a clean, often much smaller dataset. Wrapping all these with a Cache can make it very quick.

test1 <- Cache(prepInputs, targetFile = "GADM_2.8_LUX_adm0.rds", # must specify currently
                    dlFun = "raster::getData", name = "GADM", country = "LUX", level = 0,
                    path = dPath)

See vignettes and help files for many more real-world examples.

Installation

Current release (on CRAN)

Install from CRAN:

install.packages("reproducible")

Install from GitHub:

#install.packages("devtools")
library("devtools")
install_github("PredictiveEcology/reproducible", dependencies = TRUE) 

Development version

Install from GitHub:

#install.packages("devtools")
library("devtools")
install_github("PredictiveEcology/reproducible", ref = "development", dependencies = TRUE) 

Contributions

Please see CONTRIBUTING.md for information on how to contribute to this project.

Copy Link

Version

Install

install.packages('reproducible')

Monthly Downloads

1,998

Version

1.2.11

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Eliot J B

Last Published

November 9th, 2022

Functions in reproducible (1.2.11)

.addTagsToOutput

Add tags to object
createCache

Create a new cache
.cacheMessage

Create a custom cache message by class
Checksums

Calculate checksum
checkAndMakeCloudFolderID

Check for presence of checkFolderID (for Cache(useCloud))
assessDataType

Assess the appropriate raster layer data type
.checkForAuxiliaryFiles

Check a neededFile for commonly needed auxiliary files
.checkGitConfig

Check global git config file
.checkCacheRepo

Check for cache repository info in ...
basename2

A version of base::basename that is NULL resistant
compareNA

NA-aware comparison of two vectors
archiveExtractBinary

Tests if unrar or 7zip exist
getFunctionName

A set of helpers for Cache
cloudUploadFromCache

Upload a file to cloud directly from local cacheRepo
.requireNamespace

Provide standard messaging for missing package dependencies
.removeCacheAtts

Remove attributes that are highly varying
checkoutVersion

Clone, fetch, and checkout from GitHub.com repositories
clearStubArtifacts

Clear erroneous archivist artifacts
convertPaths

Change the absolute path of a file
fastMask

Faster operations on rasters (DEPRECATED as terra::mask is fast)
.digest

Calculate the hashes of multiple files
cloudUpload

Upload to cloud, if necessary
dlGeneric

Download file from generic source url
cloudDownload

Download from cloud, if necessary
downloadFile

A wrapper around a set of downloading functions
cloudSyncCacheOld

Sync cloud with local Cache
file.move

Move a file to a new location
extractFromArchive

Extract files from archive
cloudCheckOld

Basic tool for using cloud-based caching
cloudWriteOld

Basic tool for using cloud-based caching
linkOrCopy

Hardlink, symlink, or copy a file
.listFilesInArchive

List files in either a .zip or or .tar file
.prefix

Add a prefix or suffix to the basename part of a file path
.grepSysCalls

Grep system calls
messageDF

Use message to print a clean square data structure
mergeCache

Merge two cache repositories together
cropInputs

Crop a Spatial* or Raster* object
guessAtTarget

Try to pick a file to load
copySingleFile

Copy a file using robocopy on Windows and rsync on Linux/macOS
prepInputs

Download and optionally post-process files
paddedFloatToChar

Convert numeric to character with padding
pipe

A cache-aware pipe (currently not working)
.getTargetCRS

Hierarchically get crs from Raster*, Spatial*
isWindows

Test whether system is Windows
isInteractive

Alternative to interactive() for unit testing
postProcessTerra

Transform a GIS dataset so it has the properties (extent, projection, mask) of another
.setSubAttrInList

Set subattributes within a list by reference
dlGoogle

Download file from Google Drive
.formalsNotInCurrentDots

Identify which formals to a function are not in the current ...
unrarPath

The known path for unrar or 7z
makeMemoisable

Generic method to make or unmake objects memoisable
.sortDotsUnderscoreFirst

Sort or order any named object with dotted names and underscores first
preProcessParams

Download, Checksum, Extract files
.preDigestByClass

Any miscellaneous things to do before .robustDigest and after FUN call
maskInputs

Mask module inputs
searchFull

Search up the full scope for functions
.robustDigest

Create reproducible digests of objects in R
.prepareFileBackedRaster

Copy the file-backing of a file-backed Raster* object
writeOutputs

Write module inputs on disk
determineFilename

Determine filename, either automatically or manually
updateFilenameSlots

A helper function to change the filename slot of Raster* objects
.debugCache

Attach debug info to return for Cache
clearCache

Examining and modifying the cache
writeFuture

Write to cache repository, using future::future
projectInputs

Project Raster* or Spatial* or sf objects
reproducible-package

The reproducible package
reexports

Objects exported from other packages
.prepareOutput

Make any modifications to object recovered from cacheRepo
fixErrors

Do some minor error fixing
fixErrorsTerra

Fix common errors in GIS layers, using terra
.purge

Purge individual line items from checksums file
.tagsByClass

Add extra tags to an archive based on class
movedCache

Deal with moved cache issues
postProcess

Generic function to post process objects
testForArchiveExtract

Returns unrar path and creates a shortcut as .unrarPath Was not incorporated in previous function so it can be used in the tests
objSize

Wrapper around lobstr::obj_size
studyAreaName

Get a unique name for a given study area
spatialClasses-class

The spatialClasses class
retry

A wrapper around try that retries on failure
reproducibleOptions

reproducible options
.pkgEnv

The reproducible package environment
CacheDBFile

A collection of low level tools for Cache
Cache

Cache method that accommodates environments, S4 methods, Rasters, & nested caching
Filenames

Return the filename(s) from a Raster* object
CacheDigest

The exact digest function that Cache uses
Path-class

Coerce a character string to a class "Path"
Copy

Recursive copying of nested environments, and other "hard to copy" objects
cloudCache

Deprecated
.addChangedAttr

Add an attribute to an object indicating which named elements change