Learn R Programming

tidytable

tidytable is a data frame manipulation library for users who need data.table speed but prefer tidyverse-like syntax.

Installation

Install the released version from CRAN with:

install.packages("tidytable")

Or install the development version from GitHub with:

# install.packages("pak")
pak::pak("markfairbanks/tidytable")

General syntax

tidytable replicates tidyverse syntax but uses data.table in the background. In general you can simply use library(tidytable) to replace your existing dplyr and tidyr code with data.table backed equivalents.

A full list of implemented functions can be found here.

library(tidytable)

df <- data.table(x = 1:3, y = 4:6, z = c("a", "a", "b"))

df %>%
  select(x, y, z) %>%
  filter(x < 4, y > 1) %>%
  arrange(x, y) %>%
  mutate(double_x = x * 2,
         x_plus_y = x + y)
#> # A tidytable: 3 × 5
#>       x     y z     double_x x_plus_y
#>   <int> <int> <chr>    <dbl>    <int>
#> 1     1     4 a            2        5
#> 2     2     5 a            4        7
#> 3     3     6 b            6        9

Applying functions by group

You can use the normal tidyverse group_by()/ungroup() workflow, or you can use .by syntax to reduce typing. Using .by in a function is shorthand for df %>% group_by() %>% some_function() %>% ungroup().

  • A single column can be passed with .by = z
  • Multiple columns can be passed with .by = c(y, z)
df <- data.table(x = c("a", "a", "b"), y = c("a", "a", "b"), z = 1:3)

df %>%
  summarize(avg_z = mean(z),
            .by = c(x, y))
#> # A tidytable: 2 × 3
#>   x     y     avg_z
#>   <chr> <chr> <dbl>
#> 1 a     a       1.5
#> 2 b     b       3

All functions that can operate by group have a .by argument built in. (mutate(), filter(), summarize(), etc.)

The above syntax is equivalent to:

df %>%
  group_by(x, y) %>%
  summarize(avg_z = mean(z)) %>%
  ungroup()
#> # A tidytable: 2 × 3
#>   x     y     avg_z
#>   <chr> <chr> <dbl>
#> 1 a     a       1.5
#> 2 b     b       3

Both options are available for users, so you can use the syntax that you prefer.

tidyselect support

tidytable allows you to select/drop columns just like you would in the tidyverse by utilizing the tidyselect package in the background.

Normal selection can be mixed with all tidyselect helpers: everything(), starts_with(), ends_with(), any_of(), where(), etc.

df <- data.table(
  a = 1:3,
  b1 = 4:6,
  b2 = 7:9,
  c = c("a", "a", "b")
)

df %>%
  select(a, starts_with("b"))
#> # A tidytable: 3 × 3
#>       a    b1    b2
#>   <int> <int> <int>
#> 1     1     4     7
#> 2     2     5     8
#> 3     3     6     9

A full overview of selection options can be found here.

Using tidyselect in .by

tidyselect helpers also work when using .by:

df <- data.table(x = c("a", "a", "b"), y = c("a", "a", "b"), z = 1:3)

df %>%
  summarize(avg_z = mean(z),
            .by = where(is.character))
#> # A tidytable: 2 × 3
#>   x     y     avg_z
#>   <chr> <chr> <dbl>
#> 1 a     a       1.5
#> 2 b     b       3

Tidy evaluation compatibility

Tidy evaluation can be used to write custom functions with tidytable functions. The embracing shortcut {{ }} works, or you can use enquo() with !! if you prefer:

df <- data.table(x = c(1, 1, 1), y = 4:6, z = c("a", "a", "b"))

add_one <- function(data, add_col) {
  data %>%
    mutate(new_col = {{ add_col }} + 1)
}

df %>%
  add_one(x)
#> # A tidytable: 3 × 4
#>       x     y z     new_col
#>   <dbl> <int> <chr>   <dbl>
#> 1     1     4 a           2
#> 2     1     5 a           2
#> 3     1     6 b           2

The .data and .env pronouns also work within tidytable functions:

var <- 10

df %>%
  mutate(new_col = .data$x + .env$var)
#> # A tidytable: 3 × 4
#>       x     y z     new_col
#>   <dbl> <int> <chr>   <dbl>
#> 1     1     4 a          11
#> 2     1     5 a          11
#> 3     1     6 b          11

A full overview of tidy evaluation can be found here.

dt() helper

The dt() function makes regular data.table syntax pipeable, so you can easily mix tidytable syntax with data.table syntax:

df <- data.table(x = 1:3, y = 4:6, z = c("a", "a", "b"))

df %>%
  dt(, .(x, y, z)) %>%
  dt(x < 4 & y > 1) %>%
  dt(order(x, y)) %>%
  dt(, double_x := x * 2) %>%
  dt(, .(avg_x = mean(x)), by = z)
#> # A tidytable: 2 × 2
#>   z     avg_x
#>   <chr> <dbl>
#> 1 a       1.5
#> 2 b       3

Speed Comparisons

For those interested in performance, speed comparisons can be found here.

Acknowledgements

tidytable is only possible because of the great contributions to R by the data.table and tidyverse teams. data.table is used as the main data frame engine in the background, while tidyverse packages like rlang, vctrs, and tidyselect are heavily relied upon to give users an experience similar to dplyr and tidyr.

Copy Link

Version

Install

install.packages('tidytable')

Monthly Downloads

3,446

Version

0.11.2

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Mark Fairbanks

Last Published

December 11th, 2024

Functions in tidytable (0.11.2)

arrange

Arrange/reorder rows
between

Do the values from x fall between the left and right bounds?
coalesce

Coalesce missing values
as_tidytable

Coerce an object to a data.table/tidytable
distinct

Select distinct/unique rows
case_when

Case when
drop_na

Drop rows containing missing values
bind_cols

Bind data.tables by row and column
count

Count observations by group
cross_join

Cross join
complete

Complete a data.table with missing combinations of data
dt

Pipeable data.table call
enframe

Convert a vector to a data.table/tidytable
extract

Extract a character column into multiple columns using regex
%in%

Fast %in% and %notin% operators
fread

Read/write files
fill

Fill in missing values with previous or next value
inv_gc

Run invisible garbage collection
group_split

Split data frame by groups
consecutive_id

Generate a unique id for consecutive values
context

Context functions
get_dummies

Convert character and factor columns to dummy variables
expand

Expand a data.table to use all combinations of values
expand_grid

Create a data.table from all combinations of inputs
group_vars

Get the grouping variables
lag

Get lagging or leading values
if_all

Create conditions on a selection of columns
if_else

Fast if_else
left_join

Join two data.tables together
group_by

Grouping
recode

Recode values
group_cols

Selection helper for grouping columns
mutate_rowwise

Add/modify columns by row
n

Number of observations in each group
map

Apply a function to each element of a vector or list
pivot_longer

Pivot data from wide to long
case

data.table::fcase() with vectorized default
mutate

Add/modify/delete columns
n_distinct

Count the number of unique values in a vector
pull

Pull out a single variable
na_if

Convert values to NA
reexports

Objects exported from other packages
crossing

Create a data.table from all unique combinations of inputs
row_number

Ranking functions
separate_longer_delim

Split a string into rows
desc

Descending order
separate_rows

Separate a collapsed column into multiple rows
pivot_wider

Pivot data from long to wide
select

Select or drop columns
reframe

Reframe a data frame
slice_head

Choose rows in a data.table
separate

Separate a character column into multiple columns
tribble

Rowwise tidytable creation
relocate

Relocate a column to a new position
uncount

Uncount a data.table
unite

Unite multiple columns by pasting strings together
filter

Filter rows on one or more conditions
top_n

Select top (or bottom) n rows (by value)
transmute

Add new variables and drop all others
unnest_longer

Unnest a list-column of vectors into regular columns
nest_join

Nest join
summarize

Aggregate data using summary statistics
is_grouped_df

Check if the tidytable is grouped
first

Extract the first, last, or nth value from a vector
unnest

Unnest list-columns
separate_wider_regex

Separate a character column into multiple columns using regex patterns
replace_na

Replace missing values
separate_wider_delim

Separate a character column into multiple columns
rowwise

Convert to a rowwise tidytable
new_tidytable

Create a tidytable from a list
is_tidytable

Test if the object is a tidytable
nest

Nest columns into a list-column
unnest_wider

Unnest a list-column of vectors into a wide data frame
nest_by

Nest data.tables
pick

Selection version of across()
rename

Rename variables by name
tidytable-vctrs

Internal vctrs methods
rename_with

Rename multiple columns
%>%

Pipe operator
tidytable

Build a data.table/tidytable
c_across

Combine values from multiple columns
case_match

Vectorized switch()
across

Apply a function across a selection of columns
add_count

Add a count column to the data frame