Learn R Programming

⚠️There's a newer version (0.11.2) of this package.Take me there.

tidytable

Why tidytable?

  • tidyverse-like syntax with data.table speed
  • rlang compatibility
  • Includes functions that dtplyr is missing, including many tidyr functions

Note: tidytable functions do not use data.table’s modify-by-reference, and instead use the copy-on-modify principles followed by the tidyverse and base R.

Installation

Install the released version from CRAN with:

install.packages("tidytable")

Or install the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("markfairbanks/tidytable")

General syntax

tidytable uses verb.() syntax to replicate tidyverse functions:

library(tidytable)

test_df <- data.table(x = c(1,2,3), y = c(4,5,6), z = c("a","a","b"))

test_df %>%
  select.(x, y, z) %>%
  filter.(x < 4, y > 1) %>%
  arrange.(x, y) %>%
  mutate.(double_x = x * 2,
          double_y = y * 2)
#> # tidytable [3 × 5]
#>       x     y z     double_x double_y
#>   <dbl> <dbl> <chr>    <dbl>    <dbl>
#> 1     1     4 a            2        8
#> 2     2     5 a            4       10
#> 3     3     6 b            6       12

A full list of functions can be found here.

Using “group by”

Group by calls are done from inside any function that has group by functionality (such as summarize.() & mutate.())

  • A single column can be passed with .by = z
  • Multiple columns can be passed with .by = c(y, z)
test_df %>%
  summarize.(avg_x = mean(x),
             count = n.(),
             .by = z)
#> # tidytable [2 × 3]
#>   z     avg_x count
#>   <chr> <dbl> <int>
#> 1 a       1.5     2
#> 2 b       3       1

.by vs. group_by()

A key difference between tidytable/data.table & dplyr is that dplyr can have multiple functions operate “by group” with a single group_by() call.

We’ll start with an example dplyr pipe chain that utilizes group_by() and then rewrite it in tidytable. The goal is to grab the first two rows of each group using slice(), then add a row number column using mutate():

library(dplyr)

test_df <- tibble(x = c("a", "a", "a", "b", "b"))

test_df %>%
  group_by(x) %>%
  slice(1:2) %>%
  mutate(group_row_num = row_number()) %>%
  ungroup()
#> # A tibble: 4 x 2
#>   x     group_row_num
#>   <chr>         <int>
#> 1 a                 1
#> 2 a                 2
#> 3 b                 1
#> 4 b                 2

In this case both slice() and mutate() will operate “by group”. This happens until you call ungroup() at the end of the chain.

However data.table doesn’t “remember” groups between function calls. So in tidytable you need to call .by in each function you want to operate “by group”, and you don’t need to call ungroup() at the end:

library(tidytable)

test_df %>%
  slice.(1:2, .by = x) %>%
  mutate.(group_row_num = row_number.(), .by = x)
#> # tidytable [4 × 2]
#>   x     group_row_num
#>   <chr>         <int>
#> 1 a                 1
#> 2 a                 2
#> 3 b                 1
#> 4 b                 2

tidyselect support

tidytable allows you to select/drop columns just like you would in the tidyverse by utilizing the tidyselect package in the background.

Normal selection can be mixed with all tidyselect helpers: everything(), starts_with(), ends_with(), any_of(), where(), etc.

test_df <- data.table(
  a = c(1,2,3),
  b1 = c(4,5,6),
  b2 = c(7,8,9),
  c = c("a","a","b")
)

test_df %>%
  select.(a, starts_with("b"))
#> # tidytable [3 × 3]
#>       a    b1    b2
#>   <dbl> <dbl> <dbl>
#> 1     1     4     7
#> 2     2     5     8
#> 3     3     6     9

To drop columns use a - sign:

test_df %>%
  select.(-a, -starts_with("b"))
#> # tidytable [3 × 1]
#>   c    
#>   <chr>
#> 1 a    
#> 2 a    
#> 3 b

These same ideas can be used whenever selecting columns in tidytable functions - for example when using count.(), drop_na.(), mutate_across.(), pivot_longer.(), etc.

A full overview of selection options can be found here.

Using tidyselect in .by

tidyselect helpers also work when using .by:

test_df <- data.table(
  a = c(1,2,3),
  b = c(4,5,6),
  c = c("a","a","b"),
  d = c("a","a","b")
)

test_df %>%
  summarize.(avg_b = mean(b), .by = where(is.character))
#> # tidytable [2 × 3]
#>   c     d     avg_b
#>   <chr> <chr> <dbl>
#> 1 a     a       4.5
#> 2 b     b       6

rlang compatibility

rlang can be used to write custom functions with tidytable functions. The embracing shortcut {{ }} works, or you can use enquo() with !! if you prefer.

df <- data.table(x = c(1,1,1), y = c(1,1,1), z = c("a","a","b"))

add_one <- function(data, add_col) {
  data %>%
    mutate.(new_col = {{ add_col }} + 1)
}

df %>%
  add_one(x)
#> # tidytable [3 × 4]
#>       x     y z     new_col
#>   <dbl> <dbl> <chr>   <dbl>
#> 1     1     1 a           2
#> 2     1     1 a           2
#> 3     1     1 b           2

Auto-conversion

All tidytable functions automatically convert data.frame and tibble inputs to a data.table:

library(dplyr)
library(data.table)

test_df <- tibble(x = c(1,2,3), y = c(4,5,6), z = c("a","a","b"))

test_df %>%
  mutate.(double_x = x * 2) %>%
  is.data.table()
#> [1] TRUE

dt() helper

The dt() function makes regular data.table syntax pipeable, so you can easily mix tidytable syntax with data.table syntax:

df <- data.table(x = c(1,2,3), y = c(4,5,6), z = c("a", "a", "b"))

df %>%
  dt(, list(x, y, z)) %>%
  dt(x < 4 & y > 1) %>%
  dt(order(x, y)) %>%
  dt(, double_x := x * 2) %>%
  dt(, list(avg_x = mean(x)), by = z)
#> # tidytable [2 × 2]
#>   z     avg_x
#>   <chr> <dbl>
#> 1 a       1.5
#> 2 b       3

Speed Comparisons

For those interested in performance, speed comparisons can be found here.

Copy Link

Version

Install

install.packages('tidytable')

Monthly Downloads

2,490

Version

0.5.7

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Mark Fairbanks

Last Published

December 16th, 2020

Functions in tidytable (0.5.7)

arrange_across.

Arrange by a selection of variables
crossing.

Create a data.table from all unique combinations of inputs
case.

Improved data.table::fcase()
bind_cols.

Bind data.tables by row and column
complete.

Complete a data.table with missing combinations of data
count.

Count observations by group
arrange.

Arrange/reorder rows
case_when.

Case when
as_dt

Deprecated tidytable conversion
as_tidytable

Coerce an object to a data.table/tidytable
get_dummies.

Convert character and factor columns to dummy variables
expand.

Expand a data.table to use all combinations of values
expand_grid.

Create a data.table from all combinations of inputs
%notin%

notin operator
n.

Number of observations in each group
mutate_if.

Deprecated mutate helpers
knit_print.tidytable

knit_print method for tidytables
lags.

Get lagging or leading values
dt

Pipeable data.table call
drop_na.

Drop rows containing missing values
separate_rows.

Separate a collapsed column into multiple rows
uncount.

Uncount a data.table
slice.

Choose rows in a data.table
replace_na.

Replace missing values
relocate.

Relocate a column to a new position
group_split.

Split data frame by groups
ifelse.

Fast ifelse
top_n.

Select top (or bottom) n rows (by value)
transmute.

Add new variables and drop all others
row_number.

Return row number
desc.

Descending order
rename.

Rename variables by name
distinct.

Select distinct/unique rows
unite.

Unite multiple columns by pasting strings together
mutate.

Add/modify/delete columns
%>%

Pipe operator
nest_by.

Nest data.tables
summarize.

Aggregate data using summary statistics
is_tidytable

Test if the object is a tidytable
inv_gc

Run invisible garbage collection
summarize_across.

Summarize multiple columns
rename_all.

Deprecated rename helpers
mutate_across.

Mutate multiple columns simultaneously
pivot_wider.

Pivot data from long to wide
pivot_longer.

Pivot data from wide to long
rename_with.

Rename multiple columns
unnest.

Unnest a nested data.table
filter.

Filter rows on one or more conditions
fill.

Fill in missing values with previous or next value
left_join.

Join two data.tables together
map.

Apply a function to each element of a vector or list
reexports

Objects exported from other packages
pull.

Pull out a single variable
select.

Select or drop columns
tidytable-vctrs

Internal vctrs methods
separate.

Separate a character column into multiple columns
tidytable

Build a data.table/tidytable