rchallenge
The rchallenge R package provides a simple data science competition system using R Markdown and Dropbox with the following features:
- No network configuration required.
- Does not depend on external platforms like e.g. Kaggle.
- Can be easily installed on a personal computer.
- Provides a customizable template in english and french.
Further documentation is available in the Reference manual.
Please report bugs, troubles or discussions on the Issues tracker. Any contribution to improve the package is welcome.
Installation
Install the R package from CRAN repositories
install.packages("rchallenge")
or install the latest development version from GitHub
# install.packages("devtools")
devtools::install_github("adrtod/rchallenge")
A recent version of pandoc (>= 1.12.3) is also required. See the pandoc installation instructions for details on installing pandoc for your platform.
Getting started
Install a new challenge in Dropbox/mychallenge
:
or for a french version:
new_challenge(template = "fr")
You will obtain a ready-to-use challenge in the folder Dropbox/mychallenge
containing:
Name | Description |
---|---|
challenge.rmd | Template R Markdown script for the webpage. |
data | Directory of the data containing data_train and data_test datasets. |
submissions | Directory of the submissions. It will contain one subdirectory per team where they can submit their submissions. The subdirectories are shared with Dropbox. |
history | Directory where the submissions history is stored. |
The default challenge provided is a binary classification problem on the German Credit Card dataset.
You can easily customize the challenge in two ways:
- During the creation of the challenge: by using the options of the
new_challenge
function. - After the creation of the challenge: by manually replacing the data files in the
data
subdirectory and the baseline predictions insubmissions/baseline
and by customizing the templatechallenge.rmd
as needed.
Next steps
To complete the installation:
Create and share subdirectories in
submissions
for each team:Publish the html page in
Dropbox/Public
:Prior to this, make sure you enabled your Public Dropbox folder.
Give the public link to your
Dropbox/Public/challenge.html
file to the participants.Refresh the webpage by repeating step 2 on a regular basis. See below for automating this step.
From now on, a fully autonomous challenge system is set up requiring no further administration. With each update, the program automatically performs the following tasks using the functions available in our package:
Name | Description |
---|---|
store_new_submissions | Reads submitted files and save new files in the history. |
print_readerr | Displays any read errors. |
compute_metrics | Calculates the scores for each submission in the history. |
get_best | Gets the highest score per team. |
print_leaderboard | Displays the leaderboard. |
plot_history | Plots a chart of score evolution per team. |
plot_activity | Plots a chart of activity per team. |
Automating the updates
Unix/OSX
You can setup the following line to your crontab using crontab -e
(mind the quotes):
0 * * * * Rscript -e 'rchallenge::publish("~/Dropbox/mychallenge/challenge.rmd")'
This will publish a html webpage in your Dropbox/Public
folder every hour.
You might have to add the path to Rscript and pandoc at the beginning of your crontab:
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
Depending on your system or pandoc version you might also have to explicitly add the encoding option to the command:
0 * * * * Rscript -e 'rchallenge::publish("~/Dropbox/mychallenge/challenge.rmd", encoding = "utf8")'
Windows
You can use the Task Scheduler to create a new task with a Start a program action with the settings (mind the quotes):
- Program/script:
Rscript.exe
- options:
-e rchallenge::publish('~/Dropbox/mychallenge/challenge.rmd')
Examples
My own challenge (in french) given to Master students at the University of Bordeaux.
A classification and variable selection problem (in french) given by Robin Genuer (Bordeaux).
Please contact me to add yours.
Copyright
Copyright (C) 2014-2015 Adrien Todeschini.
Contributions from Robin Genuer.
Design inspired by Datascience.net, a french platform for data science challenges.
The rchallenge package is licensed under the GPLv2 (https://www.gnu.org/licenses/gpl-2.0.html).
To do list
- common leaderboard for several metrics
- do not take baseline into account in ranking
- examples, tests, vignettes
- interactive plots with
ggvis
- check arguments
- interactive webpage using Shiny