dplyr (version 0.1)

group_by: Group a tbl by one or more variables.

Description

Most data operations are useful done on groups defined by variables in the the dataset. The group_by function takes an existing tbl and converts it into a grouped tbl where operations are performed "by group".

Usage

group_by(x, ..., add = TRUE)

Arguments

x
a tbl
...
variables to group by. All tbls accept variable names, some will also accept functons of variables. Duplicated groups will be silently dropped.
add
By default, when add = TRUE, group_by will add groups to existing. To instead set the groups to a set of new values, use add = FALSE

Tbl types

group_by is an S3 generic with methods for the three built-in tbls. See the help for the corresponding classes and their manip methods for more details:

See Also

ungroup for the inverse operation, group for accessors that don't do special evaluation.

Examples

Run this code
by_cyl <- group_by(mtcars, cyl)
summarise(by_cyl, mean(disp), mean(hp))
filter(by_cyl, disp == max(disp))

# summarise peels off a single layer of grouping
by_vs_am <- group_by(mtcars, vs, am)
by_vs <- summarise(by_vs_am, n = n())
groups(by_vs)
summarise(by_vs, n = sum(n))
# use ungroup() to remove if not wanted

# You can group by expressions: this is just short-hand for
# a mutate followed by a simple group_by
group_by(mtcars, vsam = vs + am)

# By default, group_by increases grouping. Use add = FALSE to set groups
groups(group_by(by_cyl, vs, am))
groups(group_by(by_cyl, vs, am, add = FALSE))

# Duplicate groups are silently dropped
groups(group_by(by_cyl, cyl, cyl))

Run the code above in your browser using DataCamp Workspace