rchallenge-package: A Simple Data Science Challenge System
Description
A simple data science challenge system using R Markdown and Dropbox .
It requires no network configuration, does not depend on external platforms
like e.g. Kaggle and can be easily installed on a personal computer.
Installation
Install the R package from CRAN repositories install.packages("rchallenge")
or install the latest development version from GitHub # install.packages("devtools")
devtools::install_github("adrtod/rchallenge")
A recent version of pandoc (>= 1.12.3) is also required.
See the pandoc installation instructions
for details on installing pandoc for your platform.Getting started
Install a new challenge in Dropbox/mychallenge
: setwd("~/Dropbox/mychallenge")
library(rchallenge)
new_challenge()
or for a french version: new_challenge(template = "fr")
You will obtain a ready-to-use challenge in the folder Dropbox/mychallenge
containing:
-
challenge.rmd
: Template R Markdown script for the webpage.
-
data
: Directory of the data containing data_train
and data_test
datasets.
-
submissions
: Directory of the submissions. It will contain one subdirectory per team where they can submit their submissions. The subdirectories are shared with Dropbox.
-
history
: Directory where the submissions history is stored.
The default challenge provided is a binary classification problem on the German Credit Card dataset. You can easily customize the challenge in two ways:
- During the creation of the challenge: by using the options of the
new_challenge
function.
- After the creation of the challenge: by manually replacing the data files in the
data
subdirectory and the baseline predictions in submissions/baseline
and by customizing the template challenge.rmd
as needed.
Next steps
To complete the installation:
- Create and share subdirectories in
submissions
for each team: new_team("team_foo", "team_bar")
- Render the html page:
publish()
Use the output_dir
argument to change the output directory.
Make sure the output HTML file is rendered, e.g. using GitHub Pages.
- Give the URL to your
challenge.html
file to the participants.
- Refresh the webpage by repeating step 2 on a regular basis. See below for automating this step.
From now on, a fully autonomous challenge system is set up requiring no further
administration. With each update, the program automatically performs the following
tasks using the functions available in our package: Automating the updates on <strong>Unix/OSX</strong>
For the step 4, you can setup the following line to your crontab
using crontab -e
(mind the quotes): 0 * * * * Rscript -e 'rchallenge::publish("~/Dropbox/mychallenge/challenge.rmd")'
This will render a HTML webpage every hour.
Use the output_dir
argument to change the output directory. You might have to add the path to Rscript and pandoc at the beginning of your crontab: PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
Depending on your system or pandoc version you might also have to explicitly add the encoding option to the command: 0 * * * * Rscript -e 'rchallenge::publish("~/Dropbox/mychallenge/challenge.rmd", encoding = "utf8")'
Automating the updates on <strong>Windows</strong>
You can use the Task Scheduler
to create a new task with a Start a program action with the settings (mind the quotes):
- Program/script:
Rscript.exe
- options:
-e rchallenge::publish('~/Dropbox/mychallenge/challenge.rmd')