Free Access Week-  Data Engineering + BI
Data engineering and BI courses are free!
Free AI Access Week from June 2-8

tidyfst (version 1.8.1)

group_dt: Data manipulation within groups

Description

Carry out data manipulation within specified groups.

Usage

group_dt(.data, by = NULL, ...)

rowwise_dt(.data, ...)

Value

data.table

Arguments

.data

A data.frame

by

Variables to group by,unquoted name of grouping variable of list of unquoted names of grouping variables.

...

Any data manipulation arguments that could be implemented on a data.frame.

Details

If you want to use summarise_dt and mutate_dt in group_dt, it is better to use the "by" parameter in those functions, that would be much faster because you don't have to use .SD (which takes extra time to copy).

References

https://stackoverflow.com/questions/36802385/use-by-each-row-for-data-table

Examples

Run this code
iris %>% group_dt(by = Species,slice_dt(1:2))
iris %>% group_dt(Species,filter_dt(Sepal.Length == max(Sepal.Length)))
iris %>% group_dt(Species,summarise_dt(new = max(Sepal.Length)))

# you can pipe in the `group_dt`
iris %>% group_dt(Species,
                  mutate_dt(max= max(Sepal.Length)) %>%
                    summarise_dt(sum=sum(Sepal.Length)))

# for users familiar with data.table, you can work on .SD directly
# following codes get the first and last row from each group
iris %>%
  group_dt(
    by = Species,
    rbind(.SD[1],.SD[.N])
  )

#' # for summarise_dt, you can use "by" to calculate within the group
mtcars %>%
  summarise_dt(
   disp = mean(disp),
   hp = mean(hp),
   by = cyl
)

  # but you could also, of course, use group_dt
 mtcars %>%
   group_dt(by =.(vs,am),
     summarise_dt(avg = mean(mpg)))

  # and list of variables could also be used
 mtcars %>%
   group_dt(by =list(vs,am),
            summarise_dt(avg = mean(mpg)))

# examples for `rowwise_dt`
df <- data.table(x = 1:2, y = 3:4, z = 4:5)

df %>% mutate_dt(m = mean(c(x, y, z)))

df %>% rowwise_dt(
  mutate_dt(m = mean(c(x, y, z)))
)

Run the code above in your browser using DataLab