Learn R Programming

Ruigi

Ruigi is a pipeline specialist, much like his python counterpart, Luigi.

How to use

Ruigi has two simple concepts that you need to understand in order to use it. The first one is the concept of a target.

A target is an abstraction of an output of a computation that also encloses methods for reading and writing. For example, a .csv file can be a valid target. Here is how it is defined in Ruigi. To use this target you can just define it like

target <- CSVtarget$new("~/Desktop/output.csv")
target$exists() # [1] FALSE
target$write(iris)
target$exists() # [1] TRUE
identical(iris, target$read()) # [1] TRUE

The second abstraction is a task. A task is an abstraction of a confined computation module, which can later become a part of a big computation pipeline. When you define a pipeline, Ruigi will automatically determine the optimal order of execution for the tasks, discover the dependencies, and perform checks to see if you have any cyclic dependencies.

A task is defined by its input targets, the output target, and the computation that needs to be performed on those targets. Note that a task can have 0, 1 or many inputs, but it has to have exactly one output. If the output target for a task exists, the computation will not be run again, saving you time.

By defining separate abstractions for computation modules (tasks), and inspectable outputs (targets), you can have your own library of data processing steps that you can combine into pipelines for different use cases.

Example

# These can be logically organized into folders and then `source`'d
# prior to defining the pipeline.
reader <- ruigi_task$new(
  requires = list(CSVtarget$new("~/Desktop/titanic.csv")),
  target = Rtarget$new("titanic_data"),
  name = "I will read a .csv file and store it on .ruigi_env",
  runner = function(requires, target) {
    out <- requires[[1]]$read()
    target$write(out)
  }
)

writer <- ruigi_task$new(
  requires = list(Rtarget$new("titanic_data")),
  target = CSVtarget$new("~/Desktop/output.csv"),
  name = "I will read a file from RAM and store it in a .csv",
  runner = function(requires, target) {
    out <- requires[[1]]$read()
    target$write(out)
  }
)

# Dependencies will be determined and the tasks will be run.
ruigi::pipeline(list(writer, reader))

# Running task:  I will read a .csv file and store it on .ruigi_env ✓
# Running task:  I will read a file from RAM and store it in a .csv ✓

# No need to run the tasks again, the results already exist
ruigi::pipeline(list(reader, writer))
# Skipping:  I will read a .csv file and store it on .ruigi_env
# Skipping:  I will read a file from RAM and store it in a .csv

Inspiration

  1. Luigi. A very powerful and

widely used python package. 2. Make. Classic. 3. Remake. A make alternative for R. If you prefer to write R code as opposed to oneliners in .yml configs you might enjoy using Ruigi!

Copy Link

Version

Version

0.1.2.9000

License

MIT

Issues

Pull Requests

Stars

Forks

Maintainer

Kirill Sevastyanenko

Last Published

February 15th, 2017

Functions in ruigi (0.1.2.9000)

pipeline

Define the pipeline and watch it get executed
ruigi

Your friendly neighborhood pipeline specialist.
local_scheduler

Local scheduler Linearly executes the tasks (if needed)
ruigi_task

Minimal unit of computation in Ruigi. A task is a bundle that contains dependencies, a target (every node should produce one and only one target), and a function that maps the dependencies to this target. Ruigi will organize and determine the right order of computation of multiple nodes formed together into one pipeline.
CSVtarget

CSV target specifies an csv file stored on your local drive
Rtarget

R target specifies an object with a given name in .ruigi_env
ruigi_target

A minimal deliverable in ruigi. A target is an interface to an object (file, RAM address, database table, s3 bucket, etc.), with read and write methods.
S3target

S3 target specifies an object with a given path in your default s3mpi bucket