Learn R Programming

⚠️There's a newer version (1.3.3) of this package.Take me there.

rqdatatable is an implementation of the rquery piped Codd-style relational algebra hosted on data.table. rquery allow the expression of complex transformations as a series of relational operators and rqdatatable implements the operators using data.table.

A Python version of rquery/rqdatatable is under initial development as data_algebra.

For example scoring a logistic regression model (which requires grouping, ordering, and ranking) is organized as follows. For more on this example please see “Let’s Have Some Sympathy For The Part-time R User”.

library("rqdatatable")
## Loading required package: wrapr

## Loading required package: rquery
# data example
dL <- build_frame(
   "subjectID", "surveyCategory"     , "assessmentTotal" |
   1          , "withdrawal behavior", 5                 |
   1          , "positive re-framing", 2                 |
   2          , "withdrawal behavior", 3                 |
   2          , "positive re-framing", 4                 )
scale <- 0.237

# example rquery pipeline
rquery_pipeline <- local_td(dL) %.>%
  extend_nse(.,
             probability :=
               exp(assessmentTotal * scale))  %.>% 
  normalize_cols(.,
                 "probability",
                 partitionby = 'subjectID') %.>%
  pick_top_k(.,
             k = 1,
             partitionby = 'subjectID',
             orderby = c('probability', 'surveyCategory'),
             reverse = c('probability', 'surveyCategory')) %.>% 
  rename_columns(., c('diagnosis' = 'surveyCategory')) %.>%
  select_columns(., c('subjectID', 
                      'diagnosis', 
                      'probability')) %.>%
  orderby(., cols = 'subjectID')

We can show the expanded form of query tree.

cat(format(rquery_pipeline))
mk_td("dL", c(
  "subjectID",
  "surveyCategory",
  "assessmentTotal")) %.>%
 extend(.,
  probability := exp(assessmentTotal * 0.237)) %.>%
 extend(.,
  probability := probability / sum(probability),
  partitionby = c('subjectID'),
  orderby = c(),
  reverse = c()) %.>%
 extend(.,
  row_number := row_number(),
  partitionby = c('subjectID'),
  orderby = c('probability', 'surveyCategory'),
  reverse = c('probability', 'surveyCategory')) %.>%
 select_rows(.,
   row_number <= 1) %.>%
 rename_columns(.,
  c('diagnosis' = 'surveyCategory')) %.>%
 select_columns(., 
    c('subjectID', 'diagnosis', 'probability')) %.>%
 order_rows(.,
  c('subjectID'),
  reverse = c(),
  limit = NULL)

And execute it using data.table.

ex_data_table(rquery_pipeline)
##   subjectID           diagnosis probability
## 1         1 withdrawal behavior   0.6706221
## 2         2 positive re-framing   0.5589742

One can also apply the pipeline to new tables.

build_frame(
   "subjectID", "surveyCategory"     , "assessmentTotal" |
   7          , "withdrawal behavior", 5                 |
   7          , "positive re-framing", 20                ) %.>%
  rquery_pipeline
##   subjectID           diagnosis probability
## 1         7 positive re-framing   0.9722128

Initial bench-marking of rqdatatable is very favorable (notes here).

To install rqdatatable please use install.packages("rqdatatable").

Some related work includes:

Note rqdatatable has an “immediate mode” which allows direct application of pipelines stages without pre-assembling the pipeline. “Immediate mode” is a convenience for ad-hoc analyses, and has some negative performance impact, so we encourage users to build pipelines for most work. Some notes on the issue can be found here.

rqdatatable implements the rquery grammar in the style of a “Turing or Cook reduction” (implementing the result in terms of multiple oracle calls to the related system).

rqdatatable is intended for “simple column names”, in particular as rqdatatable often uses eval() to work over data.table escape characters such as “\” and “\\” are not reliable in column names. Also rqdatatable does not support tables with no columns.

Copy Link

Version

Install

install.packages('rqdatatable')

Monthly Downloads

1,433

Version

1.2.9

License

GPL-2 | GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

John Mount

Last Published

October 17th, 2020

Functions in rqdatatable (1.2.9)

ex_data_table_step.relop_orderby

Reorder rows.
reexports

Objects exported from other packages
rq_df_funciton_node

Helper to build data.table capable non-sql nodes.
ex_data_table_step.relop_extend

Implement extend/assign operator.
ex_data_table_step.relop_unionall

Bind tables together by rows.
set_rqdatatable_as_executor

Set rqdatatable package as default rquery executor
ex_data_table_step.relop_theta_join

Theta join (database implementation).
ex_data_table_step.relop_select_columns

Implement drop columns.
ex_data_table_step.relop_rename_columns

Rename columns.
ex_data_table_step.relop_select_rows

Select rows by condition.
layout_to_blocks_data_table

Map a data records from row records to block records with one record row per columnsToTakeFrom value.
ex_data_table_step.relop_set_indicator

Implement set_indicatoroperator.
layout_to_rowrecs_data_table

Map data records from block records that have one row per measurement value to row records.
make_dt_lookup_by_column

Lookup by column function factory.
ex_data_table_step.relop_table_source

Build a data source description.
ex_data_table_step.relop_sql

Direct sql node.
rbindlist_data_table

rbindlist
rq_df_grouped_funciton_node

Helper to build data.table capable non-sql nodes.
ex_data_table_step.relop_project

Implement projection operator.
rqdatatable

rqdatatable: Relational Query Generator for Data Manipulation Implemented by data.table
ex_data_table_step.relop_null_replace

Replace NAs.
ex_data_table_step.relop_natural_join

Natural join.
ex_data_table_step.relop_drop_columns

Implement drop columns.
ex_data_table

Execute an rquery pipeline with data.table sources.
ex_data_table_step.relop_order_expr

Order rows by expression.
ex_data_table_parallel

Execute an rquery pipeline with data.table in parallel.
ex_data_table_step.relop_non_sql

Direct non-sql (function) node, not implemented for data.table case.
ex_data_table_step

Execute an rquery pipeline with data.table sources.
ex_data_table_step.default

default non-impementation.