Learn R Programming

⚠️There's a newer version (0.11.2) of this package.Take me there.

tidytable

tidytable is a data frame manipulation library for users who need data.table speed but prefer tidyverse-like syntax.

Installation

Install the released version from CRAN with:

install.packages("tidytable")

Or install the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("markfairbanks/tidytable")

General syntax

tidytable replicates tidyverse syntax but uses data.table in the background. In general you can simply use library(tidytable) to replace your existing dplyr and tidyr code with the faster tidytable equivalents.

A full list of implemented functions can be found here.

library(tidytable)

df <- data.table(x = 1:3, y = 4:6, z = c("a", "a", "b"))

df %>%
  select(x, y, z) %>%
  filter(x < 4, y > 1) %>%
  arrange(x, y) %>%
  mutate(double_x = x * 2,
         x_plus_y = x + y)
#> # A tidytable: 3 × 5
#>       x     y z     double_x x_plus_y
#>   <int> <int> <chr>    <dbl>    <int>
#> 1     1     4 a            2        5
#> 2     2     5 a            4        7
#> 3     3     6 b            6        9

Applying functions by group

You can use the normal tidyverse group_by()/ungroup() workflow, or you can use .by syntax to reduce typing. Using .by in a function is shorthand for df %>% group_by() %>% fn() %>% ungroup().

  • A single column can be passed with .by = z
  • Multiple columns can be passed with .by = c(y, z)
df <- data.table(x = c("a", "a", "b"), y = c("a", "a", "b"), z = 1:3)

df %>%
  summarize(avg_z = mean(z),
            .by = c(x, y))
#> # A tidytable: 2 × 3
#>   x     y     avg_z
#>   <chr> <chr> <dbl>
#> 1 a     a       1.5
#> 2 b     b       3

All functions that can operate by group have a .by argument built in. (mutate(), filter(), summarize(), etc.)

The above syntax is equivalent to:

df %>%
  group_by(x, y) %>%
  summarize(avg_z = mean(z)) %>%
  ungroup()
#> # A tidytable: 2 × 3
#>   x     y     avg_z
#>   <chr> <chr> <dbl>
#> 1 a     a       1.5
#> 2 b     b       3

Both options are available for users, so you can use the syntax that you prefer.

tidyselect support

tidytable allows you to select/drop columns just like you would in the tidyverse by utilizing the tidyselect package in the background.

Normal selection can be mixed with all tidyselect helpers: everything(), starts_with(), ends_with(), any_of(), where(), etc.

df <- data.table(
  a = 1:3,
  b1 = 4:6,
  b2 = 7:9,
  c = c("a", "a", "b")
)

df %>%
  select(a, starts_with("b"))
#> # A tidytable: 3 × 3
#>       a    b1    b2
#>   <int> <int> <int>
#> 1     1     4     7
#> 2     2     5     8
#> 3     3     6     9

A full overview of selection options can be found here.

Using tidyselect in .by

tidyselect helpers also work when using .by:

df <- data.table(x = c("a", "a", "b"), y = c("a", "a", "b"), z = 1:3)

df %>%
  summarize(avg_z = mean(z),
            .by = where(is.character))
#> # A tidytable: 2 × 3
#>   x     y     avg_z
#>   <chr> <chr> <dbl>
#> 1 a     a       1.5
#> 2 b     b       3

Tidy evaluation compatibility

Tidy evaluation can be used to write custom functions with tidytable functions. The embracing shortcut {{ }} works, or you can use enquo() with !! if you prefer:

df <- data.table(x = c(1, 1, 1), y = 4:6, z = c("a", "a", "b"))

add_one <- function(data, add_col) {
  data %>%
    mutate(new_col = {{ add_col }} + 1)
}

df %>%
  add_one(x)
#> # A tidytable: 3 × 4
#>       x     y z     new_col
#>   <dbl> <int> <chr>   <dbl>
#> 1     1     4 a           2
#> 2     1     5 a           2
#> 3     1     6 b           2

The .data and .env pronouns also work within tidytable functions:

var <- 10

df %>%
  mutate(new_col = .data$x + .env$var)
#> # A tidytable: 3 × 4
#>       x     y z     new_col
#>   <dbl> <int> <chr>   <dbl>
#> 1     1     4 a          11
#> 2     1     5 a          11
#> 3     1     6 b          11

A full overview of tidy evaluation can be found here.

dt() helper

The dt() function makes regular data.table syntax pipeable, so you can easily mix tidytable syntax with data.table syntax:

df <- data.table(x = 1:3, y = 4:6, z = c("a", "a", "b"))

df %>%
  dt(, .(x, y, z)) %>%
  dt(x < 4 & y > 1) %>%
  dt(order(x, y)) %>%
  dt(, double_x := x * 2) %>%
  dt(, .(avg_x = mean(x)), by = z)
#> # A tidytable: 2 × 2
#>   z     avg_x
#>   <chr> <dbl>
#> 1 a       1.5
#> 2 b       3

Speed Comparisons

For those interested in performance, speed comparisons can be found here.

verb.() syntax

For backwards compatibility tidytable exports verb.() versions of functions. This will also allow users to more easily combine dplyr and tidytable functions in one script:

df <- data.table(x = 1:3, y = 4:6, z = c("a", "a", "b"))

df %>%
  mutate.(double_x = x * 2)
#> # A tidytable: 3 × 4
#>       x     y z     double_x
#>   <int> <int> <chr>    <dbl>
#> 1     1     4 a            2
#> 2     2     5 a            4
#> 3     3     6 b            6

Acknowledgements

tidytable is only possible because of the great contributions to R by the data.table and tidyverse teams. data.table is used as the main data frame engine in the background, while tidyverse packages like rlang, vctrs, and tidyselect are heavily relied upon to give users an experience similar to dplyr and tidyr.

Copy Link

Version

Install

install.packages('tidytable')

Monthly Downloads

3,101

Version

0.9.2

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Mark Fairbanks

Last Published

January 13th, 2023

Functions in tidytable (0.9.2)

add_count.

Add a count column to the data frame
add_count

Add a count column to the data frame
bind_cols.

Bind data.tables by row and column
c_across

Combine values from multiple columns
as_tidytable

Coerce an object to a data.table/tidytable
arrange_across.

Arrange by a selection of variables
case

data.table::fcase() with vectorized default
between

Do the values from x fall between the left and right bounds?
between.

Do the values from x fall between the left and right bounds?
bind_rows.

Bind data.tables by row and column
coalesce

Coalesce missing values
consecutive_id

Generate a unique id for consecutive values
complete.

Complete a data.table with missing combinations of data
complete

Complete a data.table with missing combinations of data
case_match

Vectorized switch()
consecutive_id.

Generate a unique id for consecutive values
case.

data.table::fcase() with vectorized default
c_across.

Combine values from multiple columns
context

Context functions
bind_cols

Bind data.tables by row and column
case_match.

Vectorized switch()
case_when

Case when
cur_group_rows.

Context functions
desc.

Descending order
enframe

Convert a vector to a data.table/tidytable
cross_join

Cross join
expand

Expand a data.table to use all combinations of values
count.

Count observations by group
crossing

Create a data.table from all unique combinations of inputs
case_when.

Case when
first.

Extract the first, last, or nth value from a vector
enframe.

Convert a vector to a data.table/tidytable
dt

Pipeable data.table call
drop_na

Drop rows containing missing values
expand.

Expand a data.table to use all combinations of values
coalesce.

Coalesce missing values
crossing.

Create a data.table from all unique combinations of inputs
cur_column.

Context functions
distinct

Select distinct/unique rows
first

Extract the first, last, or nth value from a vector
get_dummies

Convert character and factor columns to dummy variables
expand_grid

Create a data.table from all combinations of inputs
drop_na.

Drop rows containing missing values
expand_grid.

Create a data.table from all combinations of inputs
extract.

Extract a character column into multiple columns using regex
get_dummies.

Convert character and factor columns to dummy variables
fread

Read/write files
fread.

Read/write files
group_vars

Get the grouping variables
group_cols

Selection helper for grouping columns
full_join.

Join two data.tables together
extract

Extract a character column into multiple columns using regex
is_grouped_df

Check if the tidytable is grouped
group_vars.

Get the grouping variables
group_split

Split data frame by groups
if_all

Create conditions on a selection of columns
group_split.

Split data frame by groups
if_all.

Create conditions on a selection of columns
is_grouped_df.

Check if the tidytable is grouped
is_tidytable

Test if the object is a tidytable
left_join

Join two data.tables together
if_any.

Create conditions on a selection of columns
lag

Get lagging or leading values
map2_df.

Apply a function to each element of a vector or list
map.

Apply a function to each element of a vector or list
filter

Filter rows on one or more conditions
count

Count observations by group
if_else

Fast if_else
if_else.

Fast if_else
leads.

Get lagging or leading values
filter.

Filter rows on one or more conditions
map2_int.

Apply a function to each element of a vector or list
fwrite.

Read/write files
map2_dfr.

Apply a function to each element of a vector or list
left_join.

Join two data.tables together
map

Apply a function to each element of a vector or list
inv_gc

Run invisible garbage collection
inner_join.

Join two data.tables together
map2_lgl.

Apply a function to each element of a vector or list
mutate_across.

Mutate multiple columns simultaneously
map2_dfc.

Apply a function to each element of a vector or list
map_chr.

Apply a function to each element of a vector or list
n_distinct

Count the number of unique values in a vector
na_if.

Convert values to NA
mutate_rowwise.

Add/modify columns by row
map_dbl.

Apply a function to each element of a vector or list
map2.

Apply a function to each element of a vector or list
desc

Descending order
fill

Fill in missing values with previous or next value
cur_data.

Context functions
nesting.

Expand a data.table to use all combinations of values
fill.

Fill in missing values with previous or next value
cur_group_id.

Context functions
distinct.

Select distinct/unique rows
na_if

Convert values to NA
nest.

Nest data.tables
map_lgl.

Apply a function to each element of a vector or list
map_int.

Apply a function to each element of a vector or list
new_tidytable

Create a tidytable from a list
pick

Selection version of across()
nth.

Extract the first, last, or nth value from a vector
nest_join

Nest join
nest_by

Nest data.tables
pivot_longer

Pivot data from wide to long
pivot_wider.

Pivot data from long to wide
map_df.

Apply a function to each element of a vector or list
pmap_dbl.

Apply a function to each element of a vector or list
pmap_chr.

Apply a function to each element of a vector or list
n

Number of observations in each group
group_by

Grouping
n_distinct.

Count the number of unique values in a vector
rename.

Rename variables by name
group_by.

Grouping
map_dfc.

Apply a function to each element of a vector or list
map_dfr.

Apply a function to each element of a vector or list
row_number

Ranking functions
pull

Pull out a single variable
nest

Nest data.tables
rename_with.

Rename multiple columns
pmap_int.

Apply a function to each element of a vector or list
recode

Recode values
pmap_dfr.

Apply a function to each element of a vector or list
rename

Rename variables by name
rowwise

Convert to a rowwise tidytable
rowwise.

Convert to a rowwise tidytable
reexports

Objects exported from other packages
nest_by.

Nest data.tables
separate_wider_delim

Separate a character column into multiple columns
separate_wider_regex

Separate a character column into multiple columns using regex patterns
pivot_wider

Pivot data from long to wide
row_number.

Ranking functions
summarize_across.

Summarize multiple columns
separate.

Separate a character column into multiple columns
right_join.

Join two data.tables together
rename_with

Rename multiple columns
tally.

Count observations by group
semi_join.

Join two data.tables together
separate_rows

Separate a collapsed column into multiple rows
separate_rows.

Separate a collapsed column into multiple rows
slice_tail.

Choose rows in a data.table
%in%

Fast %in% and %notin% operators
mutate

Add/modify/delete columns
mutate_rowwise

Add/modify columns by row
mutate.

Add/modify/delete columns
ifelse.

Fast ifelse
map2_chr.

Apply a function to each element of a vector or list
lags.

Get lagging or leading values
map2_dbl.

Apply a function to each element of a vector or list
last.

Extract the first, last, or nth value from a vector
summarise.

Aggregate data using summary statistics
slice.

Choose rows in a data.table
summarize.

Aggregate data using summary statistics
top_n

Select top (or bottom) n rows (by value)
top_n.

Select top (or bottom) n rows (by value)
summarize

Aggregate data using summary statistics
relocate.

Relocate a column to a new position
unnest_longer

Unnest a list-column of vectors into regular columns
relocate

Relocate a column to a new position
replace_na

Replace missing values
tidytable-vctrs

Internal vctrs methods
replace_na.

Replace missing values
slice_sample.

Choose rows in a data.table
ungroup.

Grouping
pmap.

Apply a function to each element of a vector or list
unnest_wider.

Unnest a list-column of vectors into a wide data frame
slice_head

Choose rows in a data.table
tidytable

Build a data.table/tidytable
unnest_wider

Unnest a list-column of vectors into a wide data frame
transmute

Add new variables and drop all others
%>%

Pipe operator
slice_min.

Choose rows in a data.table
unite.

Unite multiple columns by pasting strings together
transmute.

Add new variables and drop all others
n.

Number of observations in each group
walk.

Apply a function to each element of a vector or list
unite

Unite multiple columns by pasting strings together
unnest.

Unnest list-columns
pivot_longer.

Pivot data from wide to long
pmap_df.

Apply a function to each element of a vector or list
separate_longer_delim

Split a string into rows
separate

Separate a character column into multiple columns
select

Select or drop columns
pmap_lgl.

Apply a function to each element of a vector or list
pmap_dfc.

Apply a function to each element of a vector or list
uncount.

Uncount a data.table
slice_head.

Choose rows in a data.table
slice_max.

Choose rows in a data.table
select.

Select or drop columns
pull.

Pull out a single variable
unnest

Unnest list-columns
uncount

Uncount a data.table
unnest_longer.

Unnest a list-column of vectors into regular columns
arrange.

Arrange/reorder rows
across

Apply a function across a selection of columns
across.

Apply a function across a selection of columns
add_tally.

Add a count column to the data frame
anti_join.

Join two data.tables together
arrange

Arrange/reorder rows