Learn R Programming

⚠️There's a newer version (0.11.2) of this package.Take me there.

tidytable

Why tidytable?

  • tidyverse-like syntax with data.table speed
  • rlang compatibility
  • Includes functions that dtplyr is missing, including many tidyr functions

Installation

Install the released version from CRAN with:

install.packages("tidytable")

Or install the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("markfairbanks/tidytable")

General syntax

tidytable uses verb.() syntax to replicate tidyverse functions:

library(tidytable)

test_df <- data.table(x = 1:3, y = 4:6, z = c("a","a","b"))

test_df %>%
  select.(x, y, z) %>%
  filter.(x < 4, y > 1) %>%
  arrange.(x, y) %>%
  mutate.(double_x = x * 2,
          double_y = y * 2)
#> # A tidytable: 3 × 5
#>       x     y z     double_x double_y
#>   <int> <int> <chr>    <dbl>    <dbl>
#> 1     1     4 a            2        8
#> 2     2     5 a            4       10
#> 3     3     6 b            6       12

A full list of functions can be found here.

Using “group by”

Group by calls are done by using the .by argument of any function that has “by group” functionality.

  • A single column can be passed with .by = z
  • Multiple columns can be passed with .by = c(y, z)
test_df %>%
  summarize.(avg_x = mean(x),
             count = n(),
             .by = z)
#> # A tidytable: 2 × 3
#>   z     avg_x count
#>   <chr> <dbl> <int>
#> 1 a       1.5     2
#> 2 b       3       1

.by vs. group_by()

tidytable follows data.table semantics where .by must be called each time you want a function to operate “by group”.

Below is some example tidytable code that utilizes .by that we’ll then compare to its dplyr equivalent. The goal is to grab the first two rows of each group using slice.(), then add a group row number column using mutate.():

library(tidytable)

test_df <- data.table(x = c("a", "a", "a", "b", "b"))

test_df %>%
  slice.(1:2, .by = x) %>%
  mutate.(group_row_num = row_number(), .by = x)
#> # A tidytable: 4 × 2
#>   x     group_row_num
#>   <chr>         <int>
#> 1 a                 1
#> 2 a                 2
#> 3 b                 1
#> 4 b                 2

Note how .by is called in both slice.() and mutate.().

Compared to a dplyr pipe chain that utilizes group_by(), where each function operates “by group” until ungroup() is called:

library(dplyr)

test_df <- tibble(x = c("a", "a", "a", "b", "b"))

test_df %>%
  group_by(x) %>%
  slice(1:2) %>%
  mutate(group_row_num = row_number()) %>%
  ungroup()
#> # A tibble: 4 x 2
#>   x     group_row_num
#>   <chr>         <int>
#> 1 a                 1
#> 2 a                 2
#> 3 b                 1
#> 4 b                 2

Note that the ungroup() call is unnecessary in tidytable.

tidyselect support

tidytable allows you to select/drop columns just like you would in the tidyverse by utilizing the tidyselect package in the background.

Normal selection can be mixed with all tidyselect helpers: everything(), starts_with(), ends_with(), any_of(), where(), etc.

test_df <- data.table(
  a = 1:3,
  b1 = 4:6,
  b2 = 7:9,
  c = c("a","a","b")
)

test_df %>%
  select.(a, starts_with("b"))
#> # A tidytable: 3 × 3
#>       a    b1    b2
#>   <int> <int> <int>
#> 1     1     4     7
#> 2     2     5     8
#> 3     3     6     9

To drop columns use a - sign:

test_df %>%
  select.(-a, -starts_with("b"))
#> # A tidytable: 3 × 1
#>   c    
#>   <chr>
#> 1 a    
#> 2 a    
#> 3 b

These same ideas can be used whenever selecting columns in tidytable functions - for example when using count.(), drop_na.(), across.(), pivot_longer.(), etc.

A full overview of selection options can be found here.

Using tidyselect in .by

tidyselect helpers also work when using .by:

test_df <- data.table(
  a = 1:3,
  b = 4:6,
  c = c("a","a","b"),
  d = c("a","a","b")
)

test_df %>%
  summarize.(avg_b = mean(b), .by = where(is.character))
#> # A tidytable: 2 × 3
#>   c     d     avg_b
#>   <chr> <chr> <dbl>
#> 1 a     a       4.5
#> 2 b     b       6

rlang compatibility

rlang can be used to write custom functions with tidytable functions. The embracing shortcut {{ }} works, or you can use enquo() with !! if you prefer.

df <- data.table(x = c(1,1,1), y = c(1,1,1), z = c("a","a","b"))

add_one <- function(data, add_col) {
  data %>%
    mutate.(new_col = {{ add_col }} + 1)
}

df %>%
  add_one(x)
#> # A tidytable: 3 × 4
#>       x     y z     new_col
#>   <dbl> <dbl> <chr>   <dbl>
#> 1     1     1 a           2
#> 2     1     1 a           2
#> 3     1     1 b           2

dt() helper

The dt() function makes regular data.table syntax pipeable, so you can easily mix tidytable syntax with data.table syntax:

df <- data.table(x = 1:3, y = 4:6, z = c("a", "a", "b"))

df %>%
  dt(, .(x, y, z)) %>%
  dt(x < 4 & y > 1) %>%
  dt(order(x, y)) %>%
  dt(, double_x := x * 2) %>%
  dt(, .(avg_x = mean(x)), by = z)
#> # A tidytable: 2 × 2
#>   z     avg_x
#>   <chr> <dbl>
#> 1 a       1.5
#> 2 b       3

Compatibility

Compatibility with dplyr (and other tidyverse packages)

If you want to use a dplyr function that hasn’t yet been implemented in tidytable you can. For example - dplyr::add_count():

library(tidytable)
library(dplyr)

test_df <- tidytable(x = 1:3, y = c("a", "a", "b"))

test_df %>%
  mutate.(double_x = x * 2) %>%
  add_count()
#> # A tidytable: 3 × 4
#>       x y     double_x     n
#>   <int> <chr>    <dbl> <int>
#> 1     1 a            2     3
#> 2     2 a            4     3
#> 3     3 b            6     3

Compatibility with data.table

If you want to use data.table you can - however it is recommended to first convert the object to a data.table if you are using any of data.table’s “set” operations to prevent issues with data.table’s modify-by-reference.

library(tidytable)
library(data.table)

test_df <- tidytable(x = 3:1, y = c("c", "b", "a"))

new_df <- test_df %>%
  mutate.(double_x = x * 2)

new_df <- as.data.table(new_df)

setorder(new_df, y)[]
#>    x y double_x
#> 1: 1 a        2
#> 2: 2 b        4
#> 3: 3 c        6

Speed Comparisons

For those interested in performance, speed comparisons can be found here.

Copy Link

Version

Install

install.packages('tidytable')

Monthly Downloads

4,409

Version

0.6.2

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Mark Fairbanks

Last Published

May 18th, 2021

Functions in tidytable (0.6.2)

arrange_across.

Arrange by a selection of variables
complete.

Complete a data.table with missing combinations of data
count.

Count observations by group
%notin%

notin operator
drop_na.

Drop rows containing missing values
arrange.

Arrange/reorder rows
across.

Apply a function across a selection of columns
dt

Pipeable data.table call
replace_na.

Replace missing values
group_split.

Split data frame by groups
row_number.

Return row number
top_n.

Select top (or bottom) n rows (by value)
inv_gc

Run invisible garbage collection
n.

Number of observations in each group
n_distinct.

Count the number of unique values in a vector
is_tidytable

Test if the object is a tidytable
transmute.

Add new variables and drop all others
uncount.

Uncount a data.table
rename.

Rename variables by name
rename_with.

Rename multiple columns
between.

Do the values from x fall between the left and right bounds?
nest.

Nest data.tables
cur_group_id.

Current group context
crossing.

Create a data.table from all unique combinations of inputs
bind_cols.

Bind data.tables by row and column
desc.

Descending order
mutate_across.

Mutate multiple columns simultaneously
distinct.

Select distinct/unique rows
filter.

Filter rows on one or more conditions
get_dummies.

Convert character and factor columns to dummy variables
nest_by.

Nest data.tables
mutate_rowwise.

Add/modify columns by row
unite.

Unite multiple columns by pasting strings together
summarize.

Aggregate data using summary statistics
summarize_across.

Summarize multiple columns
pull.

Pull out a single variable
pivot_wider.

Pivot data from long to wide
slice.

Choose rows in a data.table
separate_rows.

Separate a collapsed column into multiple rows
case_when.

Case when
case.

data.table::fcase() with vectorized default
c_across.

Combine values from multiple columns
expand.

Expand a data.table to use all combinations of values
if_all.

Create conditions on a selection of columns
expand_grid.

Create a data.table from all combinations of inputs
ifelse.

Fast ifelse
map.

Apply a function to each element of a vector or list
extract.

Extract a character column into multiple columns using regex
fill.

Fill in missing values with previous or next value
coalesce.

Coalesce missing values
mutate.

Add/modify/delete columns
lags.

Get lagging or leading values
left_join.

Join two data.tables together
%>%

Pipe operator
reexports

Objects exported from other packages
pivot_longer.

Pivot data from wide to long
tidytable-vctrs

Internal vctrs methods
tidytable

Build a data.table/tidytable
relocate.

Relocate a column to a new position
separate.

Separate a character column into multiple columns
select.

Select or drop columns
unnest.

Unnest a nested data.table
as_tidytable

Coerce an object to a data.table/tidytable