Last chance! 50% off unlimited learning
Sale ends in
mutate()
adds new variables and preserves existing ones;
transmute()
adds new variables and drops existing ones.
New variables overwrite existing variables of the same name.
Variables can be removed by setting their value to NULL
.
mutate(.data, ...)# S3 method for data.frame
mutate(
.data,
...,
.keep = c("all", "used", "unused", "none"),
.before = NULL,
.after = NULL
)
transmute(.data, ...)
An object of the same type as .data
. The output has the following
properties:
For mutate()
:
Columns from .data
will be preserved according to the .keep
argument.
Existing columns that are modified by ...
will always be returned in
their original location.
New columns created through ...
will be placed according to the
.before
and .after
arguments.
For transmute()
:
Columns created or modified through ...
will be returned in the order
specified by ...
.
Unmodified grouping columns will be placed at the front.
The number of rows is not affected.
Columns given the value NULL
will be removed.
Groups will be recomputed if a grouping variable is mutated.
Data frame attributes are preserved.
A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). See Methods, below, for more details.
<data-masking
> Name-value pairs.
The name gives the name of the column in the output.
The value can be:
A vector of length 1, which will be recycled to the correct length.
A vector the same length as the current group (or the whole data frame if ungrouped).
NULL
, to remove the column.
A data frame or tibble, to create multiple columns in the output.
Control which columns from
.data
are retained in the output. Grouping
columns and columns created by ...
are always kept.
"all"
retains all columns from .data
. This is the default.
"used"
retains only the columns used in ...
to create new
columns. This is useful for checking your work, as it displays inputs
and outputs side-by-side.
"unused"
retains only the columns not used in ...
to create new
columns. This is useful if you generate new columns, but no longer need
the columns used to generate them.
"none"
doesn't retain any extra columns from .data
. Only the grouping
variables and columns created by ...
are kept.
<
tidy-select
> Optionally, control where new columns
should appear (the default is to add to the right hand side). See
relocate()
for more details.
Because mutating expressions are computed within groups, they may yield different results on grouped tibbles. This will be the case as soon as an aggregating, lagging, or ranking function is involved. Compare this ungrouped mutate:
starwars %>%
select(name, mass, species) %>%
mutate(mass_norm = mass / mean(mass, na.rm = TRUE))
With the grouped equivalent:
starwars %>%
select(name, mass, species) %>%
group_by(species) %>%
mutate(mass_norm = mass / mean(mass, na.rm = TRUE))
The former normalises mass
by the global average whereas the
latter normalises by the averages within species levels.
These function are generics, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.
Methods available in currently loaded packages:
mutate()
: dplyr:::methods_rd("mutate").
transmute()
: dplyr:::methods_rd("transmute").
Other single table verbs:
arrange()
,
filter()
,
rename()
,
select()
,
slice()
,
summarise()
# Newly created variables are available immediately
starwars %>%
select(name, mass) %>%
mutate(
mass2 = mass * 2,
mass2_squared = mass2 * mass2
)
# As well as adding new variables, you can use mutate() to
# remove variables and modify existing variables.
starwars %>%
select(name, height, mass, homeworld) %>%
mutate(
mass = NULL,
height = height * 0.0328084 # convert to feet
)
# Use across() with mutate() to apply a transformation
# to multiple columns in a tibble.
starwars %>%
select(name, homeworld, species) %>%
mutate(across(!name, as.factor))
# see more in ?across
# Window functions are useful for grouped mutates:
starwars %>%
select(name, mass, homeworld) %>%
group_by(homeworld) %>%
mutate(rank = min_rank(desc(mass)))
# see `vignette("window-functions")` for more details
# By default, new columns are placed on the far right.
# Experimental: you can override with `.before` or `.after`
df <- tibble(x = 1, y = 2)
df %>% mutate(z = x + y)
df %>% mutate(z = x + y, .before = 1)
df %>% mutate(z = x + y, .after = x)
# By default, mutate() keeps all columns from the input data.
# Experimental: You can override with `.keep`
df <- tibble(x = 1, y = 2, a = "a", b = "b")
df %>% mutate(z = x + y, .keep = "all") # the default
df %>% mutate(z = x + y, .keep = "used")
df %>% mutate(z = x + y, .keep = "unused")
df %>% mutate(z = x + y, .keep = "none") # same as transmute()
# Grouping ----------------------------------------
# The mutate operation may yield different results on grouped
# tibbles because the expressions are computed within groups.
# The following normalises `mass` by the global average:
starwars %>%
select(name, mass, species) %>%
mutate(mass_norm = mass / mean(mass, na.rm = TRUE))
# Whereas this normalises `mass` by the averages within species
# levels:
starwars %>%
select(name, mass, species) %>%
group_by(species) %>%
mutate(mass_norm = mass / mean(mass, na.rm = TRUE))
# Indirection ----------------------------------------
# Refer to column names stored as strings with the `.data` pronoun:
vars <- c("mass", "height")
mutate(starwars, prod = .data[[vars[[1]]]] * .data[[vars[[2]]]])
# Learn more in ?dplyr_data_masking
Run the code above in your browser using DataLab