group_by: Group data

Description

Groups data by specified columns: further operations then work within those groups

Usage

group_by(.self, ..., auto_partition = NULL)
group_by_(.self, ..., .dots, .cols = NULL, auto_partition = NULL)

Arguments

.self

Data frame

...

Additional parameters

auto_partition

Re-partition across cluster after operation

.dots

Workaround for non-standard evaluation

.cols

Columns to group by (used internally)

Value

Data frame

Details

Many data analysis problems require working with particular combinations of data. For example, finding the average sales for a given day of the week could be achieved with group_by(day) and summarise(sales = mean(sales). This would result in a data frame with 7 rows (1 for each group) with the average sales stored in the sales column.

Multiple grouping variables may be specified, separated by columns. The above example could be extended to group by month as well as weekday, e.g. group_by(month, day). The resulting data frame would then have 12 blocks of 7 (84 rows) with an average for each week day in that month provided the same way as above.

Examples

Run this code


dat <- Multiplyr (x=1:100, G=rep(c("A", "B", "C", "D"), each=25))
dat %>% group_by (G) %>% summarise (N=length(x))
dat %>% shutdown()

Run the code above in your browser using DataLab