Learn R Programming

⚠️There's a newer version (1.3.4) of this package.Take me there.

rchallenge

The rchallenge R package provides a simple data science competition system using R Markdown and Dropbox with the following features:

  • No network configuration required.
  • Does not depend on external platforms like e.g. Kaggle.
  • Can be easily installed on a personal computer.
  • Provides a customizable template in english and french.

Further documentation is available in the Reference manual.

Please report bugs, troubles or discussions on the Issues tracker. Any contribution to improve the package is welcome.

Installation

Install the R package from CRAN repositories

install.packages("rchallenge")

or install the latest development version from GitHub

# install.packages("devtools")
devtools::install_github("adrtod/rchallenge")

A recent version of pandoc (>= 1.12.3) is also required. See the pandoc installation instructions for details on installing pandoc for your platform.

Getting started

Install a new challenge in Dropbox/mychallenge:

setwd("~/Dropbox/mychallenge")
library(rchallenge)
?new_challenge
new_challenge()

or for a french version:

new_challenge(template = "fr")

You will obtain a ready-to-use challenge in the folder Dropbox/mychallenge containing:

NameDescription
challenge.rmdTemplate R Markdown script for the webpage.
dataDirectory of the data containing data_train and data_test datasets.
submissionsDirectory of the submissions. It will contain one subdirectory per team where they can submit their submissions. The subdirectories are shared with Dropbox.
historyDirectory where the submissions history is stored.

The default challenge provided is a binary classification problem on the German Credit Card dataset.

You can easily customize the challenge in two ways:

  • During the creation of the challenge: by using the options of the new_challenge function.
  • After the creation of the challenge: by manually replacing the data files in the data subdirectory and the baseline predictions in submissions/baseline and by customizing the template challenge.rmd as needed.

Next steps

To complete the installation:

  1. Create and share subdirectories in submissions for each team:

    ?new_team
    new_team("team_foo", "team_bar")
  2. Render the html page:

    ?publish
    publish()

    Use the output_dir argument to change the output directory. Make sure the output HTML file is rendered, e.g. using GitHub Pages.

  3. Give the URL to your challenge.html file to the participants.

  4. Refresh the webpage by repeating step 2 on a regular basis. See below for automating this step.

From now on, a fully autonomous challenge system is set up requiring no further administration. With each update, the program automatically performs the following tasks using the functions available in our package:

NameDescription
store_new_submissionsReads submitted files and save new files in the history.
print_readerrDisplays any read errors.
compute_metricsCalculates the scores for each submission in the history.
get_bestGets the highest score per team.
print_leaderboardDisplays the leaderboard.
plot_historyPlots a chart of score evolution per team.
plot_activityPlots a chart of activity per team.

Automating the updates

Unix/OSX

You can setup the following line to your crontab using crontab -e (mind the quotes):

0 * * * * Rscript -e 'rchallenge::publish("~/Dropbox/mychallenge/challenge.rmd")'

This will render a HTML webpage every hour. Use the output_dir argument to change the output directory.

You might have to add the path to Rscript and pandoc at the beginning of your crontab:

PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin

Depending on your system or pandoc version you might also have to explicitly add the encoding option to the command:

0 * * * * Rscript -e 'rchallenge::publish("~/Dropbox/mychallenge/challenge.rmd", encoding = "utf8")'

Windows

You can use the Task Scheduler to create a new task with a Start a program action with the settings (mind the quotes):

  • Program/script: Rscript.exe
  • options: -e rchallenge::publish('~/Dropbox/mychallenge/challenge.rmd')

Issues

  • The rendering of HTML content provided by Dropbox will be discontinued from the 3rd October 2016 for Basic users and the 1st September 2017 for Pro and Business users. See https://www.dropbox.com/help/16. Alternatively, GitHub Pages provide an easy HTML publishing solution via a simple GitHub repository.

Examples

Please contact me to add yours.

Copyright

Copyright (C) 2014-2015 Adrien Todeschini.

Contributions from Robin Genuer.

Design inspired by Datascience.net, a french platform for data science challenges.

The rchallenge package is licensed under the GPLv2 (https://www.gnu.org/licenses/gpl-2.0.html).

To do list

  • common leaderboard for several metrics
  • do not take baseline into account in ranking
  • examples, tests, vignettes
  • interactive plots with ggvis
  • check arguments
  • interactive webpage using Shiny

Copy Link

Version

Install

install.packages('rchallenge')

Monthly Downloads

749

Version

1.2.0

License

GPL-2

Maintainer

Adrien Todeschini

Last Published

October 5th, 2016

Functions in rchallenge (1.2.0)

get_best

Get the best submissions per team and per metric.
german

German Credit Data.
glyphicon

Path to glyphicon image file.
html_img

html code for an image.
get_data

Get dataset value.
data_split

Split a data.frame into training and test sets.
last_update

Formatted last update date before deadline.
data_partition

Data partitionning function adapted from the caret package.
print_leaderboard

Format the leaderboard in Markdown.
new_challenge

Install a new challenge.
new_team

Create new teams submission folders in your challenge.
store_new_submissions

Store new submission files.
plot_activity

Plot the density of submissions over time.
print_readerr

Print read errors.
str_rank

String displayed for the rank.
publish

Render your challenge R Markdown script to a HTML page.
update_rank_diff

Update the rank differences of the teams.
plot_history

Plot the history of the scores of each team over time.
rchallenge-package

A Simple Data Science Challenge System
countdown

Countdown before deadline.
compute_metrics

Compute metrics of the submissions in the history.