Learn R Programming

⚠️There's a newer version (0.11.2) of this package.Take me there.

tidytable

tidytable is a data frame manipulation library for users who need data.table speed but prefer tidyverse-like syntax.

Installation

Install the released version from CRAN with:

install.packages("tidytable")

Or install the development version from GitHub with:

# install.packages("pak")
pak::pak("markfairbanks/tidytable")

General syntax

tidytable replicates tidyverse syntax but uses data.table in the background. In general you can simply use library(tidytable) to replace your existing dplyr and tidyr code with data.table backed equivalents.

A full list of implemented functions can be found here.

library(tidytable)

df <- data.table(x = 1:3, y = 4:6, z = c("a", "a", "b"))

df %>%
  select(x, y, z) %>%
  filter(x < 4, y > 1) %>%
  arrange(x, y) %>%
  mutate(double_x = x * 2,
         x_plus_y = x + y)
#> # A tidytable: 3 × 5
#>       x     y z     double_x x_plus_y
#>   <int> <int> <chr>    <dbl>    <int>
#> 1     1     4 a            2        5
#> 2     2     5 a            4        7
#> 3     3     6 b            6        9

Applying functions by group

You can use the normal tidyverse group_by()/ungroup() workflow, or you can use .by syntax to reduce typing. Using .by in a function is shorthand for df %>% group_by() %>% some_function() %>% ungroup().

  • A single column can be passed with .by = z
  • Multiple columns can be passed with .by = c(y, z)
df <- data.table(x = c("a", "a", "b"), y = c("a", "a", "b"), z = 1:3)

df %>%
  summarize(avg_z = mean(z),
            .by = c(x, y))
#> # A tidytable: 2 × 3
#>   x     y     avg_z
#>   <chr> <chr> <dbl>
#> 1 a     a       1.5
#> 2 b     b       3

All functions that can operate by group have a .by argument built in. (mutate(), filter(), summarize(), etc.)

The above syntax is equivalent to:

df %>%
  group_by(x, y) %>%
  summarize(avg_z = mean(z)) %>%
  ungroup()
#> # A tidytable: 2 × 3
#>   x     y     avg_z
#>   <chr> <chr> <dbl>
#> 1 a     a       1.5
#> 2 b     b       3

Both options are available for users, so you can use the syntax that you prefer.

tidyselect support

tidytable allows you to select/drop columns just like you would in the tidyverse by utilizing the tidyselect package in the background.

Normal selection can be mixed with all tidyselect helpers: everything(), starts_with(), ends_with(), any_of(), where(), etc.

df <- data.table(
  a = 1:3,
  b1 = 4:6,
  b2 = 7:9,
  c = c("a", "a", "b")
)

df %>%
  select(a, starts_with("b"))
#> # A tidytable: 3 × 3
#>       a    b1    b2
#>   <int> <int> <int>
#> 1     1     4     7
#> 2     2     5     8
#> 3     3     6     9

A full overview of selection options can be found here.

Using tidyselect in .by

tidyselect helpers also work when using .by:

df <- data.table(x = c("a", "a", "b"), y = c("a", "a", "b"), z = 1:3)

df %>%
  summarize(avg_z = mean(z),
            .by = where(is.character))
#> # A tidytable: 2 × 3
#>   x     y     avg_z
#>   <chr> <chr> <dbl>
#> 1 a     a       1.5
#> 2 b     b       3

Tidy evaluation compatibility

Tidy evaluation can be used to write custom functions with tidytable functions. The embracing shortcut {{ }} works, or you can use enquo() with !! if you prefer:

df <- data.table(x = c(1, 1, 1), y = 4:6, z = c("a", "a", "b"))

add_one <- function(data, add_col) {
  data %>%
    mutate(new_col = {{ add_col }} + 1)
}

df %>%
  add_one(x)
#> # A tidytable: 3 × 4
#>       x     y z     new_col
#>   <dbl> <int> <chr>   <dbl>
#> 1     1     4 a           2
#> 2     1     5 a           2
#> 3     1     6 b           2

The .data and .env pronouns also work within tidytable functions:

var <- 10

df %>%
  mutate(new_col = .data$x + .env$var)
#> # A tidytable: 3 × 4
#>       x     y z     new_col
#>   <dbl> <int> <chr>   <dbl>
#> 1     1     4 a          11
#> 2     1     5 a          11
#> 3     1     6 b          11

A full overview of tidy evaluation can be found here.

dt() helper

The dt() function makes regular data.table syntax pipeable, so you can easily mix tidytable syntax with data.table syntax:

df <- data.table(x = 1:3, y = 4:6, z = c("a", "a", "b"))

df %>%
  dt(, .(x, y, z)) %>%
  dt(x < 4 & y > 1) %>%
  dt(order(x, y)) %>%
  dt(, double_x := x * 2) %>%
  dt(, .(avg_x = mean(x)), by = z)
#> # A tidytable: 2 × 2
#>   z     avg_x
#>   <chr> <dbl>
#> 1 a       1.5
#> 2 b       3

Speed Comparisons

For those interested in performance, speed comparisons can be found here.

Acknowledgements

tidytable is only possible because of the great contributions to R by the data.table and tidyverse teams. data.table is used as the main data frame engine in the background, while tidyverse packages like rlang, vctrs, and tidyselect are heavily relied upon to give users an experience similar to dplyr and tidyr.

Copy Link

Version

Install

install.packages('tidytable')

Monthly Downloads

3,101

Version

0.10.1

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Mark Fairbanks

Last Published

April 20th, 2023

Functions in tidytable (0.10.1)

across.

Apply a function across a selection of columns
across

Apply a function across a selection of columns
between.

Do the values from x fall between the left and right bounds?
bind_cols

Bind data.tables by row and column
bind_cols.

Bind data.tables by row and column
consecutive_id

Generate a unique id for consecutive values
context

Context functions
c_across

Combine values from multiple columns
add_count

Add a count column to the data frame
case.

data.table::fcase() with vectorized default
coalesce

Coalesce missing values
between

Do the values from x fall between the left and right bounds?
complete.

Complete a data.table with missing combinations of data
complete

Complete a data.table with missing combinations of data
consecutive_id.

Generate a unique id for consecutive values
count.

Count observations by group
case

data.table::fcase() with vectorized default
case_match

Vectorized switch()
case_match.

Vectorized switch()
case_when.

Case when
desc

Descending order
crossing

Create a data.table from all unique combinations of inputs
cur_column.

Context functions
case_when

Case when
c_across.

Combine values from multiple columns
bind_rows.

Bind data.tables by row and column
count

Count observations by group
cur_group_id.

Context functions
drop_na

Drop rows containing missing values
cur_data.

Context functions
get_dummies

Convert character and factor columns to dummy variables
expand

Expand a data.table to use all combinations of values
expand.

Expand a data.table to use all combinations of values
get_dummies.

Convert character and factor columns to dummy variables
cross_join

Cross join
coalesce.

Coalesce missing values
crossing.

Create a data.table from all unique combinations of inputs
distinct

Select distinct/unique rows
cur_group_rows.

Context functions
distinct.

Select distinct/unique rows
enframe.

Convert a vector to a data.table/tidytable
enframe

Convert a vector to a data.table/tidytable
expand_grid.

Create a data.table from all combinations of inputs
first

Extract the first, last, or nth value from a vector
desc.

Descending order
filter.

Filter rows on one or more conditions
first.

Extract the first, last, or nth value from a vector
if_else.

Fast if_else
drop_na.

Drop rows containing missing values
extract.

Extract a character column into multiple columns using regex
extract

Extract a character column into multiple columns using regex
inner_join.

Join two data.tables together
if_else

Fast if_else
inv_gc

Run invisible garbage collection
left_join

Join two data.tables together
group_cols

Selection helper for grouping columns
group_vars

Get the grouping variables
group_split.

Split data frame by groups
last.

Extract the first, last, or nth value from a vector
if_all.

Create conditions on a selection of columns
lags.

Get lagging or leading values
expand_grid

Create a data.table from all combinations of inputs
map.

Apply a function to each element of a vector or list
mutate.

Add/modify/delete columns
filter

Filter rows on one or more conditions
fread

Read/write files
group_split

Split data frame by groups
fread.

Read/write files
mutate_across.

Mutate multiple columns simultaneously
mutate

Add/modify/delete columns
is_grouped_df.

Check if the tidytable is grouped
mutate_rowwise.

Add/modify columns by row
map2_chr.

Apply a function to each element of a vector or list
map_lgl.

Apply a function to each element of a vector or list
map2_dbl.

Apply a function to each element of a vector or list
map_int.

Apply a function to each element of a vector or list
pick

Selection version of across()
na_if

Convert values to NA
recode

Recode values
nest.

Nest columns into a list-column
reexports

Objects exported from other packages
nth.

Extract the first, last, or nth value from a vector
dt

Pipeable data.table call
is_grouped_df

Check if the tidytable is grouped
pivot_wider

Pivot data from long to wide
group_by.

Grouping
fill

Fill in missing values with previous or next value
fill.

Fill in missing values with previous or next value
rowwise

Convert to a rowwise tidytable
select.

Select or drop columns
pmap.

Apply a function to each element of a vector or list
pull.

Pull out a single variable
pmap_lgl.

Apply a function to each element of a vector or list
separate_rows

Separate a collapsed column into multiple rows
map2_dfc.

Apply a function to each element of a vector or list
map2_df.

Apply a function to each element of a vector or list
group_vars.

Get the grouping variables
full_join.

Join two data.tables together
group_by

Grouping
transmute

Add new variables and drop all others
slice_head

Choose rows in a data.table
separate_wider_delim

Separate a character column into multiple columns
slice_head.

Choose rows in a data.table
tribble

Rowwise tidytable creation
unnest

Unnest list-columns
rename_with

Rename multiple columns
unnest_longer.

Unnest a list-column of vectors into regular columns
fwrite.

Read/write files
replace_na.

Replace missing values
is_tidytable

Test if the object is a tidytable
mutate_rowwise

Add/modify columns by row
map2.

Apply a function to each element of a vector or list
map_df.

Apply a function to each element of a vector or list
lag

Get lagging or leading values
map

Apply a function to each element of a vector or list
if_all

Create conditions on a selection of columns
summarize

Aggregate data using summary statistics
n.

Number of observations in each group
if_any.

Create conditions on a selection of columns
map_dbl.

Apply a function to each element of a vector or list
nest

Nest columns into a list-column
%in%

Fast %in% and %notin% operators
ifelse.

Fast ifelse
n

Number of observations in each group
map_dfc.

Apply a function to each element of a vector or list
map_dfr.

Apply a function to each element of a vector or list
reframe

Reframe a data frame
pmap_dfr.

Apply a function to each element of a vector or list
pmap_int.

Apply a function to each element of a vector or list
nesting.

Expand a data.table to use all combinations of values
nest_by.

Nest data.tables
leads.

Get lagging or leading values
left_join.

Join two data.tables together
map2_dfr.

Apply a function to each element of a vector or list
new_tidytable

Create a tidytable from a list
pivot_longer

Pivot data from wide to long
pivot_wider.

Pivot data from long to wide
tidytable

Build a data.table/tidytable
summarize_across.

Summarize multiple columns
ungroup.

Grouping
top_n.

Select top (or bottom) n rows (by value)
n_distinct.

Count the number of unique values in a vector
relocate

Relocate a column to a new position
rename.

Rename variables by name
separate.

Separate a character column into multiple columns
unite.

Unite multiple columns by pasting strings together
slice.

Choose rows in a data.table
separate_wider_regex

Separate a character column into multiple columns using regex patterns
tally.

Count observations by group
rowwise.

Convert to a rowwise tidytable
tidytable-vctrs

Internal vctrs methods
row_number.

Ranking functions
map2_int.

Apply a function to each element of a vector or list
relocate.

Relocate a column to a new position
separate

Separate a character column into multiple columns
map2_lgl.

Apply a function to each element of a vector or list
%>%

Pipe operator
slice_max.

Choose rows in a data.table
pivot_longer.

Pivot data from wide to long
pmap_chr.

Apply a function to each element of a vector or list
pull

Pull out a single variable
select

Select or drop columns
pmap_dbl.

Apply a function to each element of a vector or list
replace_na

Replace missing values
row_number

Ranking functions
semi_join.

Join two data.tables together
slice_min.

Choose rows in a data.table
right_join.

Join two data.tables together
slice_sample.

Choose rows in a data.table
slice_tail.

Choose rows in a data.table
unite

Unite multiple columns by pasting strings together
n_distinct

Count the number of unique values in a vector
rename

Rename variables by name
pmap_dfc.

Apply a function to each element of a vector or list
nest_by

Nest data.tables
nest_join

Nest join
pmap_df.

Apply a function to each element of a vector or list
na_if.

Convert values to NA
map_chr.

Apply a function to each element of a vector or list
rename_with.

Rename multiple columns
separate_longer_delim

Split a string into rows
unnest.

Unnest list-columns
summarise.

Aggregate data using summary statistics
uncount.

Uncount a data.table
summarize.

Aggregate data using summary statistics
separate_rows.

Separate a collapsed column into multiple rows
unnest_wider

Unnest a list-column of vectors into a wide data frame
top_n

Select top (or bottom) n rows (by value)
transmute.

Add new variables and drop all others
walk.

Apply a function to each element of a vector or list
uncount

Uncount a data.table
unnest_longer

Unnest a list-column of vectors into regular columns
unnest_wider.

Unnest a list-column of vectors into a wide data frame
arrange_across.

Arrange by a selection of variables
as_tidytable

Coerce an object to a data.table/tidytable
arrange

Arrange/reorder rows
add_count.

Add a count column to the data frame
arrange.

Arrange/reorder rows
add_tally.

Add a count column to the data frame
anti_join.

Join two data.tables together