# Rclean v1.0.0

0

0th

Percentile

## A Tool for Writing Cleaner, more Transparent Code

To create clearer, more concise code provides this toolbox helps coders to isolate the essential parts of a script that produces a chosen result, such as an object, tables and figures written to disk and even warnings and errors. This work was funded by US National Science Foundation grant SSI-1450277 for applications of End-to-End Data Provenance.

• Have you ever written a long script in R that conducts oodles of analyses and wished that someone would come along and make it all clearer to understand and use?
• Well you’re not alone.
• A recent survey of over 1500 scientists reported a crisis of reproducibility with "selective reporting" being the most cited contributing factor and 80% saying code availability is playing a role
• We created Rclean to help scientists more easily write "cleaner" code
• Rclean provides a simple way get the code you need to produce a specific result
• Rclean uses data provenance tp capture what your code actually does when it’s running and then allows you to pull out the essential code that produces specific outputs.
• By focusing in on the specific results you want, Rclean let’s you spend more energy on your science and less time figuring out your code.

# Install and Setup

library(devtools)
install_github("ProvTools/cleanR")
install_github("ProvTools/provR")
install.packages("jsonlite”)
install.packages("igraph”)
install.packages("formatR”)


Then prior to use, load-up the following packages:

library(cleanR)
library(provR)
library("jsonlite")
library(igraph)
library(formatR)


# Usage

Once you have your script and workspace setup, you can use Rclean to get clean chunks of a larger script that produce specific results you want. We'll use the micro.R script, which can be found inside the package repo in the exec directory. The following example assumes that your current working directory is exec.

First, you'll need to record information about the script you would like to parse. Rclean uses data provenance to verify what lines of code depend on each other inside of the larger script. We can use the provR package to generate provenance. The next bit of code runs our script and saves the provenance to memory, which we then pass to the options function, so that Rclean has access to it:

prov.capture("micro.R")
options(prov.json = prov.json())


Or, if you have provenance saved as a text file, you can load it in like this:

options(prov.json = readLines("prov_micro.json"))


Now that we have the provenance loaded, we can start cleaning. Rclean will give us a list of possible values we can get code for:

clean()


You can then pick and choose from among these results and get the essential code to produce the output, like so:

clean(x)


Notice that the 'clean' function doesn't require you to quote your results, it interprets all inputs as names of results.

In many cases, it's handy just to take a look at the isolated code, but if you can also save the code for later use or sharing.

my.code <- clean(x)
write.code(my.code, file = "x.R")


If you would like to copy your code to the clipboard, you can do that by not specifying a file path.

write.code(my.code)


Happy cleaning!

## Functions in Rclean

 Name Description write.code write.code --- Write code to disk. OUTPUT = Writes out code from an object to a specified file. parse.graph .parse.graph --- Prases the PROV-JSON formatted output OUTPUT = A symmetric matrix of provenance entity reltionships. parse.info .parse.info --- Parse node information from PROV-JSON. OUTPUT = A matrix of node information. clean clean --- Produces more transparent code. OUTPUT = The essential code needed to produce a result. read.prov read.prov --- Read and parse provenance from a JSON file. OUTPUT = Returns a dataframe containing the provenance. get.spine .get.spine --- Find the minimal path through the provenance necessary to produce a result. prov_json Provenance data from micro.R No Results!