Learn R Programming

⚠️There's a newer version (1.7.1) of this package.Take me there.

Getting started with workflowr

The goal of the workflowr package is to make it easier for researchers to organize their projects and share their results with colleagues. If you are already writing R code to analyze data, and know the basics of Git and GitHub, you can start taking advantage of workflowr immediately. In a matter of minutes, you can create a research website like this.

Setup

This tutorial assumes you have already R and Git installed, have a global user.name and user.email set (you can run git config -l in the shell to confirm), and have an account on GitHub. Rstudio is also convenient, especially if you have a version greater than 1.0, but is not required. If you need instructions to install these, Software Carpentry maintains high-quality installation instructions for use in its workshops. Follow the instructions for your particular operating system to install R, install Git, and configure Git. You can obtain a GitHub account at https://github.com/.

workflowr is hosted in a GitHub repository. Install it with devtools.

# install.packages("devtools")
devtools::install_github("jdblischak/workflowr", build_vignettes = TRUE)

To see the available vignettes, run browseVignettes("workflowr"). This will open a webpage with a link to each vignette. Currently the following vignettes are available (you are currently reading the first vignette):

Vignette titleTo open from R
Getting started with workflowrvignette("getting-started", "workflowr")
Customize your research websitevignette("customization", "workflowr")
How the workflowr package worksvignette("how-it-works", "workflowr")

As you use workflowr, if you find any unexpected behavior or think of an additional feature that would be nice to have, please open an Issue here. When writing your bug report or feature request, please note the version of workflowr you are using (which you can obtain by running packageVersion("workflowr")).

Start the project

To start a new project, open R (or RStudio), load the workflowr package, and run the function start_project. This creates a directory called new-project/ that contains all the files to get started, and also initializes a Git repository with the inital commit already made.

library("workflowr")
start_project("A new project", "new-project")

If you're using RStudio, you can choose Open Project... and select the file new-project.Rproj. This will set the correct working directory in the R console, switch the file navigator to the project, and configure the Git and Build panes. It will also reset the R environment, so you'll need to re-run library("workflowr"). If you're not using RStudio, set the working directory to the root of the project directory.

The project directory contains multiple files and subdirectories. The goal is for these to help organize your project.

  • Put raw data files in data/
  • Put processed data files in output/
  • Put long running scripts in code/. Ideally these steps can be automated using GNU Make or a similar tool
  • Put R Markdown analysis files in analysis/. These should load data using relative paths, e.g. d <- read.table("../output/clean-data.txt")
  • The rendered HTML files and other website code will automatically be put in docs/ (described in the next section)

Build the website

To build the website, run the function render_site in the R console, making sure that the argument input points to analysis/.

render_site(input = "analysis/")

This is much simpler if you are using RStudio version 1.0 or greater because you do not have to worry about the current working directory. Click "Build Website" in the Build pane (or use the keyboard shortcut Ctrl+Shift+B) to have RStudio properly run render_site and load a preview of the website in the Viewer pane.

You now have a functioning website on your local computer. All the website files are located in docs/. Update the files index.Rmd and about.Rmd in analysis/ and re-build the site to see your changes. If you aren't using RStudio, open docs/index.html to view the website in your computer's web browser and refresh your browser after running render_site to see your changes.

Add a new analysis file

To start a new analysis, run the following command:

open_rmd("first-analysis.Rmd")

This performs multiple actions:

  1. Creates a new file analysis/first-analysis.Rmd based on the workflowr R Markdown template (it doesn't overwrite the file if it already exists)
  2. Sets the working directory to the analysis/ directory
  3. If you are using RStudio, opens the file for editing

Now you are ready to start writing! At the top of the file, edit the author, title, and date. If you are using RStudio, press the Knit button to render the file and see a preview in the Viewer pane. From the R console, you could run render_site again (equivalent to clicking "Build Website" in RStudio), but this would re-render every R Markdown file. To only render those R Markdown files that are new or have been updated, use the workflowr function make_site.

make_site()

Check out your new file docs/first-analysis.html. Near the top you will see a line that says "Code version:" followed by an alphanumeric character string. This informs you which version of the code was used to create the file (this is explained more below in the section Version the website).

In order to make it easier to navigate to your new file, you can include a link to it on the main index page. For example, open analysis/index.Rmd and paste the following line. This uses the Markdown syntax for creating a hyperlink (for a quick reference guide in RStudio click "Help" -> "Markdown Quick Reference"). You specify the HTML version of the file since this is what comprises the website.

Click on this [link](first-analysis.html) to see my results.

Click Knit (or run make_site again) to render the file.

Version the website

One of the main goals of workflowr is to help make your research more transparent and reproducible. An important component of this is using the version control system Git. By including the unique identifier that Git assigns a snapshot (or "commit" as Git calls it) at the top of each of your rendered HTML files, you always know which version of the code produced the results.

To get started with workflowr, the main Git commands you will need to use are git add and git commit. There are many ways to use Git: in the shell, in the RStudio Git pane, or another Git graphical user interface (GUI) (see here for GUI options). This vignette will display the Git commands to run in the shell (you can quickly open the Terminal to the correct working directoy from RStudio by choosing "Tools" -> "Shell...").

When you first started the project with start_project, workflowr automatically made the first commit of all the project files. Since then you have updated some of the R Markdown files, added a new file first-analysis.Rmd, and built the website in docs/. In order for the version of the code displayed at the top of the file to be useful, it needs to refer to the exact state of the R Markdown files when the HTML was rendered. To acheive this, you'll need to commit the R Markdown files and corresponding HTML in separate steps. First add and commit the R Markdown files in analysis/.

# Run in the shell
git add analysis/*Rmd
git commit -m "Started my first analysis"

Second you'll need to re-render the corresponding HTML files so that they have the latest commit identifier at the top of the file. However you can't use make_site for this like you did during testing, because the R Markdown files haven't changed. To re-build only those HTML files that correspond to the R Markdown files in the previous commit, run the workflowr function commit_site.

commit_site()

commit_site will render the R Markdown files and create a new commit with the message "Build site.".

This is the general workflow.

  1. Write code in the R Markdown files (For RStudio users: to quickly develop the code I recommend executing the code in the R console via Ctrl-Enter to send one line or Ctrl-Alt-C to execute the entire code chunk)
  2. Run make_site to view the results as they will appear on the website
  3. Go back to step 1 until you are satisfied with the result
  4. Commit the R Markdown files in analysis
  5. Run commit_site to build and commit the website

You only need to run render_site, which re-renders every single R Markdown file, if you are updating the aesthetics of your website (e.g. anytime you make edits to _analysis/_site.yml).

Deploy the website

At this point you have built a version-controlled website that exists on your local computer. The next step is to put your code on GitHub so that it can serve your website online. To do this, login to your account on GitHub and create a new repository following these instructions. Make sure you do not add an automatically-generated README, .gitignore, or license (these are important, but workflowr already creates them for you). Next you will copy the Git commands under the heading "…or push an existing repository from the command line". Using the hypothetical GitHub username "myname" and hypothetical repository name "myproject"^[The name of the repository on GitHub does not need to be identical to the directory name of your local Git repo; however, it is convenient to have them match since this is the default behavior of git clone when copying your repo to a another computer], the two lines will look like the following:

# Run in the shell
git remote add origin https://github.com/myname/myproject.git
git push -u origin master

The first line creates the alias "origin" that points to your remote repository on GitHub^["origin" is the convential name, but could be anything you wanted]. The second pushes your current commit history to GitHub. You will be prompted to enter your GitHub username and password for authentication after you run git push.

Now that your code is on GitHub, you need to tell GitHub that you want the files in docs/ to be published as a website. Go to Settings -> GitHub Pages and choose "master branch docs/ folder" as the Source (instructions). Using the hypothetical names above, the repository would be hosted at the URL https://myname.github.io/myproject/^[It may take a few minutes for the site to be rendered.]. If you scroll back down to the GitHub Pages section of the Settings page, you can click on the URL there.

Note that even if you create a private GitHub repository, the website it creates is still public. If you're not ready to publish your results online, you can always wait and activate GitHub Pages later. In the meantime, you'll still have a version-controlled, organized set of results for your project. However, the risk that someone that doesn't have the link to your website is able to find it is very low. Search engines prioritize the results by how many other sites link to a site, so your website will not be high in the results even if you search for very specific terms. Thus if you only share the URL to your results with your close collaborators, and request that they not share it widely, your website is effectively private. That being said, being truly scooped in science is rare (at best) and openly sharing your work will help establish your expertise in the field (and furthermore establishes your priority), so you should consider keeping both your code and website public.

Further reading

  • For advice on using R Markdown files to organize your analysis, read the

chapter R Markdown workflow in the book R for Data Science by Garrett Grolemund and Hadley Wickham

Copy Link

Version

Install

install.packages('workflowr')

Monthly Downloads

499

Version

0.2.0

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

John Blischak

Last Published

August 23rd, 2023

Functions in workflowr (0.2.0)

start_project

Start a new workflowr project.
create_results

Create a results file
decide_to_render

Decide with files to render and commit
commit_site

Commit the website files
create_gitignore

Create a default .gitignore file
obtain_files_in_commit_root

Obtain the files updated in the root commit
open_rmd

Open an R Markdown analysis file
make_site

Render only the updated R Markdown files
obtain_files_in_commit

Obtain the files updated in a commit