rchallenge-package: A Simple Data Science Challenge System
Description
A simple data science challenge system using R Markdown and Dropbox .
It requires no network configuration, does not depend on external platforms
like e.g. Kaggle and can be easily installed on a personal computer.
Installation
Install the R package from CRAN repositories install.packages("rchallenge") or install the latest development version from GitHub # install.packages("devtools") devtools::install_github("adrtod/rchallenge") A recent version of pandoc (>= 1.12.3) is also required.
See the pandoc installation instructions
for details on installing pandoc for your platform.Getting started
Install a new challenge in Dropbox/mychallenge: setwd("~/Dropbox/mychallenge") library(rchallenge) new_challenge() or for a french version: new_challenge(template = "fr") You will obtain a ready-to-use challenge in the folder Dropbox/mychallenge containing:
-
challenge.rmd: Template R Markdown script for the webpage.
-
data: Directory of the data containing data_train and data_test datasets.
-
submissions: Directory of the submissions. It will contain one subdirectory per team where they can submit their submissions. The subdirectories are shared with Dropbox.
-
history: Directory where the submissions history is stored.
The default challenge provided is a binary classification problem on the German Credit Card dataset. You can easily customize the challenge in two ways:
- During the creation of the challenge: by using the options of the
new_challenge function.
- After the creation of the challenge: by manually replacing the data files in the
data subdirectory and the baseline predictions in submissions/baseline and by customizing the template challenge.rmd as needed.
Next steps
To complete the installation:
- Create and share subdirectories in
submissions for each team: new_team("team_foo", "team_bar")
- Render the html page:
publish()
Use the output_dir argument to change the output directory.
Make sure the output HTML file is rendered, e.g. using GitHub Pages.
- Give the URL to your
challenge.html file to the participants.
- Refresh the webpage by repeating step 2 on a regular basis. See below for automating this step.
From now on, a fully autonomous challenge system is set up requiring no further
administration. With each update, the program automatically performs the following
tasks using the functions available in our package: Automating the updates on <strong>Unix/OSX</strong>
For the step 4, you can setup the following line to your crontab
using crontab -e (mind the quotes): 0 * * * * Rscript -e 'rchallenge::publish("~/Dropbox/mychallenge/challenge.rmd")' This will render a HTML webpage every hour.
Use the output_dir argument to change the output directory. You might have to add the path to Rscript and pandoc at the beginning of your crontab: PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin Depending on your system or pandoc version you might also have to explicitly add the encoding option to the command: 0 * * * * Rscript -e 'rchallenge::publish("~/Dropbox/mychallenge/challenge.rmd", encoding = "utf8")'Automating the updates on <strong>Windows</strong>
You can use the Task Scheduler
to create a new task with a Start a program action with the settings (mind the quotes):
- Program/script:
Rscript.exe
- options:
-e rchallenge::publish('~/Dropbox/mychallenge/challenge.rmd')