Learn R Programming

tidyfst (version 0.8.8)

group_by_dt: Group by variable(s) and implement operations

Description

Using setkey and setkeyv in data.table to carry out group_by-like functionalities in dplyr. This is not only convenient but also efficient in computation.

Usage

group_by_dt(data, ..., cols = NULL, inplace = FALSE)

group_exe_dt(data, ...)

Arguments

data

A data frame

...

Variables to group by for group_by_dt, namely the columns to sort by. Do not quote the column names. Any data manipulation arguments that could be implemented on a data.frame for group_exe_dt.

cols

A character vector of column names to group by.

inplace

Should the grouping implemented by reference? (Modify the original data.frame) Default uses FALSE.

Value

A data.table

Details

group_by_dt and group_exe_dt are a pair of functions to be used in combination. It utilizes the feature of key setting in data.table, which provides high performance for group operations, especially when you have to operate by specific groups frequently.

Examples

Run this code
# NOT RUN {
# group by Species in iris data set
as.data.table(iris) -> a
key(a)
group_by_dt(a,Species,inplace = FALSE)
key(a)

# use inplace operation to group by reference
as.data.table(iris) -> a
key(a)
group_by_dt(a,Species,inplace = TRUE)
key(a)

# aggregation after grouping using group_exe_dt
as.data.table(iris) -> a
a %>%
  group_by_dt(Species) %>%
  group_exe_dt(head(1))

a %>%
  group_by_dt(Species) %>%
  group_exe_dt(
    head(3) %>%
      summarise_dt(sum = sum(Sepal.Length))
  )
# }

Run the code above in your browser using DataLab