Last chance! 50% off unlimited learning
Sale ends in
filter(.data, ...)summarise(.data, ...)
summarize(.data, ...)
mutate(.data, ...)
arrange(.data, ...)
select(.data, ...)
The five key data manipulation functions are:
&
.
These are all made significantly more useful when applied by group,
as with group_by
dplyr comes with three built-in tbls. Read the help for the manip methods of that class to get more details:
src_sqlite
src_postgres
src_mysql
Generally, manipulation functions will return an output object of the same type as their input. The exceptions are:
summarise
will return an ungrouped source
summarise
and mutate
dplyr methods do not preserve row names. If have been using row names to store important information, please make them explicit variables.
Note that for local data frames, the ordering is done in C++ code which does not have access to the local specific ordering usually done in R. This means that strings are ordered as if in the C locale.
As well as using existing functions like :
and c
, there are
a number of special functions that only work inside select
starts_with(x, ignore.case = FALSE)
:
names starts with x
ends_with(x, ignore.case = FALSE)
:
names ends in x
contains(x, ignore.case = FALSE)
:
selects all variables whose name contains x
matches(x, ignore.case = FALSE)
:
selects all variables whose name matches the regular expression x
num_range("x", 1:5, width = 2)
:
selects all variables (numerically) from x01 to x05.
To drop variables, use -
. You can rename variables with
named arguments.
filter(mtcars, cyl == 8)
select(mtcars, mpg, cyl, hp:vs)
arrange(mtcars, cyl, disp)
mutate(mtcars, displ_l = disp / 61.0237)
summarise(mtcars, mean(disp))
summarise(group_by(mtcars, cyl), mean(disp))
# More detailed select examples ------------------------------
iris <- tbl_df(iris) # so it prints a little nicer
select(iris, starts_with("Petal"))
select(iris, ends_with("Width"))
select(iris, contains("etal"))
select(iris, matches(".t."))
select(iris, Petal.Length, Petal.Width)
df <- as.data.frame(matrix(runif(100), nrow = 10))
df <- tbl_df(df[c(3, 4, 7, 1, 9, 8, 5, 2, 6, 10)])
select(df, V4:V6)
select(df, num_range("V", 4:6))
# Drop variables
select(iris, -starts_with("Petal"))
select(iris, -ends_with("Width"))
select(iris, -contains("etal"))
select(iris, -matches(".t."))
select(iris, -Petal.Length, -Petal.Width)
# Rename variables
select(iris, petal_length = Petal.Length)
select(iris, petal = starts_with("Petal"))
Run the code above in your browser using DataLab