multiplyr (version 0.1.1)

group_by: Group data

Description

Groups data by specified columns: further operations then work within those groups

Usage

group_by(.self, ..., auto_partition = NULL)
group_by_(.self, ..., .dots, .cols = NULL, auto_partition = NULL)

Arguments

.self
Data frame
...
Additional parameters
auto_partition
Re-partition across cluster after operation
.dots
Workaround for non-standard evaluation
.cols
Columns to group by (used internally)

Value

Data frame

Details

Many data analysis problems require working with particular combinations of data. For example, finding the average sales for a given day of the week could be achieved with group_by(day) and summarise(sales = mean(sales). This would result in a data frame with 7 rows (1 for each group) with the average sales stored in the sales column.

Multiple grouping variables may be specified, separated by columns. The above example could be extended to group by month as well as weekday, e.g. group_by(month, day). The resulting data frame would then have 12 blocks of 7 (84 rows) with an average for each week day in that month provided the same way as above.

See Also

Other row manipulations: arrange, distinct, filter, slice

Examples

Run this code

dat <- Multiplyr (x=1:100, G=rep(c("A", "B", "C", "D"), each=25))
dat %>% group_by (G) %>% summarise (N=length(x))
dat %>% shutdown()

Run the code above in your browser using DataLab