Data manipulation functions.
These five functions form the backbone of dplyr. They are all S3 generic functions with methods for each individual data type. All functions work exactly the same way: the first argument is the tbl, and the subsequence arguments are interpreted in the context of that tbl.
- a tbl
- variables interpreted in the context of that data frame.
The five key data manipulation functions are:
- filter: return only a subset of the rows.
If multiple conditions are supplied they are combined
- select: return only a subset of the columns. If multiple columns are supplied they are all used.
- arrange: reorder the rows. Multiple inputs are ordered from left-to- right.
- mutate: add new columns. Multiple inputs create multiple columns.
- summarise: reduce each group to a single row. Multiple inputs create multiple output summaries.
These are all made significantly more useful when applied
by group, as with
dplyr comes with three built-in tbls. Read the help for the manip methods of that class to get more details:
Generally, manipulation functions will return an output object of the same type as their input. The exceptions are:
summarisewill return an ungrouped source
- remote sources (like databases) will
typically return a local source from at least
filter(mtcars, cyl == 8) select(mtcars, mpg, cyl, hp:vs) arrange(mtcars, cyl, disp) mutate(mtcars, displ_l = disp / 61.0237) summarise(mtcars, mean(disp)) summarise(group_by(mtcars, cyl), mean(disp))