Learn R Programming

⚠️There's a newer version (0.11.2) of this package.Take me there.

tidytable

tidytable is a data frame manipulation library for users who need data.table speed but prefer tidyverse-like syntax.

Installation

Install the released version from CRAN with:

install.packages("tidytable")

Or install the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("markfairbanks/tidytable")

General syntax

tidytable replicates tidyverse syntax but uses data.table in the background. In general you can simply use library(tidytable) to replace your existing dplyr and tidyr code with the faster tidytable equivalents.

A full list of implemented functions can be found here.

library(tidytable)

df <- data.table(x = 1:3, y = 4:6, z = c("a", "a", "b"))

df %>%
  select(x, y, z) %>%
  filter(x < 4, y > 1) %>%
  arrange(x, y) %>%
  mutate(double_x = x * 2,
         x_plus_y = x + y)
#> # A tidytable: 3 × 5
#>       x     y z     double_x x_plus_y
#>   <int> <int> <chr>    <dbl>    <int>
#> 1     1     4 a            2        5
#> 2     2     5 a            4        7
#> 3     3     6 b            6        9

Applying functions by group

You can use the normal tidyverse group_by()/ungroup() workflow, or you can use .by syntax to reduce typing. Using .by in a function is shorthand for df %>% group_by() %>% fn() %>% ungroup().

  • A single column can be passed with .by = z
  • Multiple columns can be passed with .by = c(y, z)
df <- data.table(x = c("a", "a", "b"), y = c("a", "a", "b"), z = 1:3)

df %>%
  summarize(avg_z = mean(z),
            .by = c(x, y))
#> # A tidytable: 2 × 3
#>   x     y     avg_z
#>   <chr> <chr> <dbl>
#> 1 a     a       1.5
#> 2 b     b       3

All functions that can operate by group have a .by argument built in. (mutate(), filter(), summarize(), etc.)

The above syntax is equivalent to:

df %>%
  group_by(x, y) %>%
  summarize(avg_z = mean(z)) %>%
  ungroup()
#> # A tidytable: 2 × 3
#>   x     y     avg_z
#>   <chr> <chr> <dbl>
#> 1 a     a       1.5
#> 2 b     b       3

Both options are available for users, so you can use the syntax that you prefer.

tidyselect support

tidytable allows you to select/drop columns just like you would in the tidyverse by utilizing the tidyselect package in the background.

Normal selection can be mixed with all tidyselect helpers: everything(), starts_with(), ends_with(), any_of(), where(), etc.

df <- data.table(
  a = 1:3,
  b1 = 4:6,
  b2 = 7:9,
  c = c("a", "a", "b")
)

df %>%
  select(a, starts_with("b"))
#> # A tidytable: 3 × 3
#>       a    b1    b2
#>   <int> <int> <int>
#> 1     1     4     7
#> 2     2     5     8
#> 3     3     6     9

A full overview of selection options can be found here.

Using tidyselect in .by

tidyselect helpers also work when using .by:

df <- data.table(x = c("a", "a", "b"), y = c("a", "a", "b"), z = 1:3)

df %>%
  summarize(avg_z = mean(z),
            .by = where(is.character))
#> # A tidytable: 2 × 3
#>   x     y     avg_z
#>   <chr> <chr> <dbl>
#> 1 a     a       1.5
#> 2 b     b       3

Tidy evaluation compatibility

Tidy evaluation can be used to write custom functions with tidytable functions. The embracing shortcut {{ }} works, or you can use enquo() with !! if you prefer:

df <- data.table(x = c(1, 1, 1), y = 4:6, z = c("a", "a", "b"))

add_one <- function(data, add_col) {
  data %>%
    mutate(new_col = {{ add_col }} + 1)
}

df %>%
  add_one(x)
#> # A tidytable: 3 × 4
#>       x     y z     new_col
#>   <dbl> <int> <chr>   <dbl>
#> 1     1     4 a           2
#> 2     1     5 a           2
#> 3     1     6 b           2

The .data and .env pronouns also work within tidytable functions:

var <- 10

df %>%
  mutate(new_col = .data$x + .env$var)
#> # A tidytable: 3 × 4
#>       x     y z     new_col
#>   <dbl> <int> <chr>   <dbl>
#> 1     1     4 a          11
#> 2     1     5 a          11
#> 3     1     6 b          11

A full overview of tidy evaluation can be found here.

dt() helper

The dt() function makes regular data.table syntax pipeable, so you can easily mix tidytable syntax with data.table syntax:

df <- data.table(x = 1:3, y = 4:6, z = c("a", "a", "b"))

df %>%
  dt(, .(x, y, z)) %>%
  dt(x < 4 & y > 1) %>%
  dt(order(x, y)) %>%
  dt(, double_x := x * 2) %>%
  dt(, .(avg_x = mean(x)), by = z)
#> # A tidytable: 2 × 2
#>   z     avg_x
#>   <chr> <dbl>
#> 1 a       1.5
#> 2 b       3

Speed Comparisons

For those interested in performance, speed comparisons can be found here.

verb.() syntax

For backwards compatibility tidytable exports verb.() versions of functions. This will also allow users to more easily combine dplyr and tidytable functions in one script:

df <- data.table(x = 1:3, y = 4:6, z = c("a", "a", "b"))

df %>%
  mutate.(double_x = x * 2)
#> # A tidytable: 3 × 4
#>       x     y z     double_x
#>   <int> <int> <chr>    <dbl>
#> 1     1     4 a            2
#> 2     2     5 a            4
#> 3     3     6 b            6

Acknowledgements

tidytable is only possible because of the great contributions to R by the data.table and tidyverse teams. data.table is used as the main data frame engine in the background, while tidyverse packages like rlang, vctrs, and tidyselect are heavily relied upon to give users an experience similar to dplyr and tidyr.

Copy Link

Version

Install

install.packages('tidytable')

Monthly Downloads

3,101

Version

0.9.1

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Mark Fairbanks

Last Published

October 25th, 2022

Functions in tidytable (0.9.1)

arrange_across.

Arrange by a selection of variables
add_tally.

Add a count column to the data frame
anti_join.

Join two data.tables together
across

Apply a function across a selection of columns
add_count

Add a count column to the data frame
arrange

Arrange/reorder rows
as_tidytable

Coerce an object to a data.table/tidytable
across.

Apply a function across a selection of columns
add_count.

Add a count column to the data frame
arrange.

Arrange/reorder rows
case

data.table::fcase() with vectorized default
case_match.

Vectorized switch()
bind_cols.

Bind data.tables by row and column
bind_cols

Bind data.tables by row and column
c_across

Combine values from multiple columns
coalesce.

Coalesce missing values
case_when

Case when
consecutive_id

Generate a unique id for consecutive values
case.

data.table::fcase() with vectorized default
complete

Complete a data.table with missing combinations of data
context

Context functions
consecutive_id.

Generate a unique id for consecutive values
between

Do the values from x fall between the left and right bounds?
between.

Do the values from x fall between the left and right bounds?
cur_group_id.

Context functions
coalesce

Coalesce missing values
case_match

Vectorized switch()
complete.

Complete a data.table with missing combinations of data
case_when.

Case when
cur_column.

Context functions
cur_data.

Context functions
enframe

Convert a vector to a data.table/tidytable
crossing

Create a data.table from all unique combinations of inputs
crossing.

Create a data.table from all unique combinations of inputs
c_across.

Combine values from multiple columns
expand.

Expand a data.table to use all combinations of values
bind_rows.

Bind data.tables by row and column
distinct

Select distinct/unique rows
count

Count observations by group
desc.

Descending order
count.

Count observations by group
distinct.

Select distinct/unique rows
desc

Descending order
dt

Pipeable data.table call
enframe.

Convert a vector to a data.table/tidytable
first

Extract the first, last, or nth value from a vector
group_split.

Split data frame by groups
fread.

Read/write files
cur_group_rows.

Context functions
group_by

Grouping
extract

Extract a character column into multiple columns using regex
expand_grid

Create a data.table from all combinations of inputs
fwrite.

Read/write files
extract.

Extract a character column into multiple columns using regex
get_dummies.

Convert character and factor columns to dummy variables
first.

Extract the first, last, or nth value from a vector
filter

Filter rows on one or more conditions
expand

Expand a data.table to use all combinations of values
group_split

Split data frame by groups
fill

Fill in missing values with previous or next value
ifelse.

Fast ifelse
group_vars.

Get the grouping variables
expand_grid.

Create a data.table from all combinations of inputs
if_all

Create conditions on a selection of columns
group_vars

Get the grouping variables
filter.

Filter rows on one or more conditions
lags.

Get lagging or leading values
if_all.

Create conditions on a selection of columns
last.

Extract the first, last, or nth value from a vector
is_grouped_df.

Check if the tidytable is grouped
%in%

Fast %in% and %notin% operators
is_tidytable

Test if the object is a tidytable
lag

Get lagging or leading values
if_any.

Create conditions on a selection of columns
left_join

Join two data.tables together
map.

Apply a function to each element of a vector or list
map2_dfc.

Apply a function to each element of a vector or list
map2_df.

Apply a function to each element of a vector or list
is_grouped_df

Check if the tidytable is grouped
map2_lgl.

Apply a function to each element of a vector or list
map_chr.

Apply a function to each element of a vector or list
mutate_across.

Mutate multiple columns simultaneously
map2_dfr.

Apply a function to each element of a vector or list
map

Apply a function to each element of a vector or list
map2.

Apply a function to each element of a vector or list
map_lgl.

Apply a function to each element of a vector or list
map2_int.

Apply a function to each element of a vector or list
map_int.

Apply a function to each element of a vector or list
mutate_rowwise.

Add/modify columns by row
na_if

Convert values to NA
drop_na.

Drop rows containing missing values
mutate

Add/modify/delete columns
n_distinct

Count the number of unique values in a vector
na_if.

Convert values to NA
nest.

Nest data.tables
mutate.

Add/modify/delete columns
nest_by.

Nest data.tables
nest

Nest data.tables
get_dummies

Convert character and factor columns to dummy variables
drop_na

Drop rows containing missing values
group_by.

Grouping
inner_join.

Join two data.tables together
n

Number of observations in each group
map_df.

Apply a function to each element of a vector or list
inv_gc

Run invisible garbage collection
pivot_longer

Pivot data from wide to long
pull

Pull out a single variable
map_dbl.

Apply a function to each element of a vector or list
pmap_dfc.

Apply a function to each element of a vector or list
nest_join

Nest join
relocate

Relocate a column to a new position
nest_by

Nest data.tables
nth.

Extract the first, last, or nth value from a vector
n_distinct.

Count the number of unique values in a vector
pmap_df.

Apply a function to each element of a vector or list
pmap.

Apply a function to each element of a vector or list
pmap_dfr.

Apply a function to each element of a vector or list
pull.

Pull out a single variable
fread

Read/write files
pmap_dbl.

Apply a function to each element of a vector or list
fill.

Fill in missing values with previous or next value
%>%

Pipe operator
pmap_chr.

Apply a function to each element of a vector or list
pivot_longer.

Pivot data from wide to long
reexports

Objects exported from other packages
rename

Rename variables by name
replace_na

Replace missing values
rename_with

Rename multiple columns
relocate.

Relocate a column to a new position
select

Select or drop columns
semi_join.

Join two data.tables together
rename.

Rename variables by name
summarize.

Aggregate data using summary statistics
replace_na.

Replace missing values
nesting.

Expand a data.table to use all combinations of values
rename_with.

Rename multiple columns
if_else.

Fast if_else
full_join.

Join two data.tables together
unnest_longer

Unnest a list-column of vectors into regular columns
slice_head.

Choose rows in a data.table
if_else

Fast if_else
summarize

Aggregate data using summary statistics
tidytable

Build a data.table/tidytable
leads.

Get lagging or leading values
map2_chr.

Apply a function to each element of a vector or list
rowwise.

Convert to a rowwise tidytable
unnest_wider.

Unnest a list-column of vectors into a wide data frame
left_join.

Join two data.tables together
slice_max.

Choose rows in a data.table
uncount

Uncount a data.table
tidytable-vctrs

Internal vctrs methods
uncount.

Uncount a data.table
slice_min.

Choose rows in a data.table
slice_sample.

Choose rows in a data.table
row_number.

Ranking functions
new_tidytable

Create a tidytable from a list
row_number

Ranking functions
map2_dbl.

Apply a function to each element of a vector or list
recode

Recode values
right_join.

Join two data.tables together
separate_rows.

Separate a collapsed column into multiple rows
n.

Number of observations in each group
separate_rows

Separate a collapsed column into multiple rows
mutate_rowwise

Add/modify columns by row
map_dfr.

Apply a function to each element of a vector or list
map_dfc.

Apply a function to each element of a vector or list
top_n.

Select top (or bottom) n rows (by value)
top_n

Select top (or bottom) n rows (by value)
unite

Unite multiple columns by pasting strings together
pivot_wider.

Pivot data from long to wide
unnest.

Unnest list-columns
separate

Separate a character column into multiple columns
summarize_across.

Summarize multiple columns
slice.

Choose rows in a data.table
slice_head

Choose rows in a data.table
separate.

Separate a character column into multiple columns
pivot_wider

Pivot data from long to wide
pmap_int.

Apply a function to each element of a vector or list
tally.

Count observations by group
rowwise

Convert to a rowwise tidytable
pmap_lgl.

Apply a function to each element of a vector or list
unite.

Unite multiple columns by pasting strings together
ungroup.

Grouping
transmute

Add new variables and drop all others
unnest

Unnest list-columns
unnest_wider

Unnest a list-column of vectors into a wide data frame
transmute.

Add new variables and drop all others
summarise.

Aggregate data using summary statistics
select.

Select or drop columns
slice_tail.

Choose rows in a data.table
walk.

Apply a function to each element of a vector or list
unnest_longer.

Unnest a list-column of vectors into regular columns