Learn R Programming

⚠️There's a newer version (0.11.2) of this package.Take me there.

tidytable

Why tidytable?

  • tidyverse-like syntax with data.table speed
  • rlang compatibility
  • Includes functions that dtplyr is missing, including many tidyr functions

Note: tidytable functions do not use data.table’s modify-by-reference, and instead use the copy-on-modify principles followed by the tidyverse and base R.

Installation

Install the released version from CRAN with:

install.packages("tidytable")

Or install the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("markfairbanks/tidytable")

General syntax

tidytable uses verb.() syntax to replicate tidyverse functions:

library(tidytable)

test_df <- data.table(x = c(1,2,3), y = c(4,5,6), z = c("a","a","b"))

test_df %>%
  select.(x, y, z) %>%
  filter.(x < 4, y > 1) %>%
  arrange.(x, y) %>%
  mutate.(double_x = x * 2,
          double_y = y * 2)
#> # tidytable [3 × 5]
#>       x     y z     double_x double_y
#>   <dbl> <dbl> <chr>    <dbl>    <dbl>
#> 1     1     4 a            2        8
#> 2     2     5 a            4       10
#> 3     3     6 b            6       12

A full list of functions can be found here.

Using “group by”

Group by calls are done from inside any function that has group by functionality (such as summarize.() & mutate.())

  • A single column can be passed with .by = z
  • Multiple columns can be passed with .by = c(y, z)
  • tidyselect can also be used, including using predicates:
    • Single predicate: .by = where(is.character)
    • Multiple predicates: .by = c(where(is.character), where(is.factor))
    • A combination of predicates and column names: .by = c(where(is.character), y)
test_df %>%
  summarize.(avg_x = mean(x),
             count = n.(),
             .by = z)
#> # tidytable [2 × 3]
#>   z     avg_x count
#>   <chr> <dbl> <int>
#> 1 a       1.5     2
#> 2 b       3       1

tidyselect support

tidytable allows you to select/drop columns just like you would in the tidyverse.

Normal selection can be mixed with:

  • Predicates: where(is.numeric), where(is.character), etc.
  • Select helpers: everything(), starts_with(), ends_with(), contains(), any_of(), etc.
test_df <- data.table(a = c(1,2,3),
                      b = c(4,5,6),
                      c = c("a","a","b"),
                      d = c("a","b","c"))

test_df %>%
  select.(where(is.numeric), d)
#> # tidytable [3 × 3]
#>       a     b d    
#>   <dbl> <dbl> <chr>
#> 1     1     4 a    
#> 2     2     5 b    
#> 3     3     6 c

To drop columns use a - sign:

test_df %>%
  select.(-where(is.numeric), -d)
#> # tidytable [3 × 1]
#>   c    
#>   <chr>
#> 1 a    
#> 2 a    
#> 3 b

These same ideas can be used whenever selecting columns in tidytable functions - for example when using count.(), drop_na.(), mutate_across.(), pivot_longer.(), etc.

A full overview of selection options can be found here.

rlang compatibility

rlang can be used to write custom functions with tidytable functions:

Custom function with mutate.()
df <- data.table(x = c(1,1,1), y = c(1,1,1), z = c("a","a","b"))

# Using enquo() with !!
add_one <- function(data, add_col) {
  
  add_col <- enquo(add_col)
  
  data %>%
    mutate.(new_col = !!add_col + 1)
}

# Using the {{ }} shortcut
add_one <- function(data, add_col) {
  data %>%
    mutate.(new_col = {{ add_col }} + 1)
}

df %>%
  add_one(x)
#> # tidytable [3 × 4]
#>       x     y z     new_col
#>   <dbl> <dbl> <chr>   <dbl>
#> 1     1     1 a           2
#> 2     1     1 a           2
#> 3     1     1 b           2
Custom function with summarize.()
df <- data.table(x = 1:10, y = c(rep("a", 6), rep("b", 4)), z = c(rep("a", 6), rep("b", 4)))

find_mean <- function(data, grouping_cols, col) {
  data %>%
    summarize.(avg = mean({{ col }}),
               .by = {{ grouping_cols }})
}

df %>%
  find_mean(grouping_cols = c(y, z), col = x)
#> # tidytable [2 × 3]
#>   y     z       avg
#>   <chr> <chr> <dbl>
#> 1 a     a       3.5
#> 2 b     b       8.5

Auto-conversion

All tidytable functions automatically convert data.frame and tibble inputs to a data.table:

library(dplyr)
library(data.table)

test_df <- tibble(x = c(1,2,3), y = c(4,5,6), z = c("a","a","b"))

test_df %>%
  mutate.(double_x = x * 2) %>%
  is.data.table()
#> [1] TRUE

dt() helper

The dt() function makes regular data.table syntax pipeable, so you can easily mix tidytable syntax with data.table syntax:

df <- data.table(x = c(1,2,3), y = c(4,5,6), z = c("a", "a", "b"))

df %>%
  dt(, list(x, y, z)) %>%
  dt(x < 4 & y > 1) %>%
  dt(order(x, y)) %>%
  dt(, double_x := x * 2) %>%
  dt(, list(avg_x = mean(x)), by = z)
#> # tidytable [2 × 2]
#>   z     avg_x
#>   <chr> <dbl>
#> 1 a       1.5
#> 2 b       3

Speed Comparisons

For those interested in performance, speed comparisons can be found here.

Copy Link

Version

Install

install.packages('tidytable')

Monthly Downloads

2,089

Version

0.5.4

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Mark Fairbanks

Last Published

December 11th, 2024

Functions in tidytable (0.5.4)

dt

Pipeable data.table call
nest_by.

Nest data.tables
n.

Number of observations in each group
rename.

Rename variables by name
rename_all.

Deprecated rename helpers
is_tidytable

Test if the object is a tidytable
filter.

Filter rows on one or more conditions
pull.

Pull out a single variable
fill.

Fill in missing values with previous or next value
inv_gc

Run invisible garbage collection
pivot_wider.

Pivot data from long to wide
row_number.

Return row number
dt_arrange

Deprecated dt_verb() functions
case.

Case when
expand.

Expand a data.table to use all combinations of values
expand_grid.

Create a data.table from all combinations of inputs
mutate_if.

Deprecated mutate helpers
summarize_across.

Summarize multiple columns
mutate_across.

Mutate multiple columns simultaneously
summarize.

Aggregate data using summary statistics
select.

Select or drop columns
%notin%

notin operator
mutate.

Mutate
map.

Apply a function to each element of a vector or list
get_dummies.

Convert character and factor columns to dummy variables
reexports

Objects exported from other packages
group_split.

Split data frame by groups
ifelse.

Fast ifelse
lags.

Get lagging or leading values
left_join.

Join two data.tables together
rename_with.

Rename multiple columns
relocate.

Relocate a column to a new position
replace_na.

Replace missing values
pivot_longer.

Pivot data from wide to long
top_n.

Select top (or bottom) n rows (by value)
tidytable

Build a data.table/tidytable
%>%

Pipe operator
unite.

Unite multiple columns by pasting strings together
unnest.

Unnest a nested data.table
transmute.

Add new variables and drop all others
slice.

Choose rows by position
uncount.

Uncount a data.table
starts_with.

Select helpers
separate.

Separate a character column into multiple columns
separate_rows.

Separate a collapsed column into multiple rows
crossing.

Create a data.table from all unique combinations of inputs
complete.

Complete a data.table with missing combinations of data
distinct.

Select distinct/unique rows
bind_cols.

Bind data.tables by row and column
count.

Count observations by group
drop_na.

Drop rows containing missing values
arrange.

Arrange/reorder rows
as_tidytable

Coerce an object to a data.table/tidytable
desc.

Deprecated