
Last chance! 50% off unlimited learning
Sale ends in
across()
can be used inside fmutate
and fsummarise
to apply one or more functions to a selection of columns. It is overall very similar to dplyr::across
, but does not support some rlang
features, has some additional features (arguments), and is optimized to work with collapse's, .FAST_FUN
, yielding much faster computations.
across(.cols = NULL, .fns, ..., .names = NULL,
.apply = "auto", .transpose = "auto")# acr(...) can be used to abbreviate across(...)
select columns using column names and expressions (e.g. a:b
or c(a, b, c:f)
), column indices, logical vectors, or functions yielding a logical value e.g. is.numeric
. NULL
applies functions to all columns except for grouping columns.
A function, character vector of functions or list of functions. Vectors / lists can be named to yield alternative names in the result (see .names
). This argument is evaluated inside substitute()
, and the content (not the names of vectors/lists) is checked against .FAST_FUN
and .OPERATOR_FUN
. Matching functions receive vectorized execution, other functions are applied to the data in a standard way.
further arguments to .fns
. Arguments are evaluated in the data environment and split by groups as well (for non-vectorized functions, if of the same length as the data).
controls the naming of computed columns. NULL
generates names of the form coli_funj
if multiple functions are used. .names = TRUE
enables this for a single function, .names = FALSE
disables it for multiple functions (sensible for functions such as .OPERATOR_FUN
that rename columns (if .apply = FALSE
)). Setting .names = "flip"
generates names of the form funj_coli
. It is also possible to supply a function with two arguments for column and function names e.g. function(c, f) paste0(f, "_", c)
. Finally, you can supply a custom vector of names which must match length(.cols) * length(.fns)
.
controls whether functions are applied column-by-column (TRUE
) or to multiple columns at once (FALSE
). The default, "auto"
, does the latter for vectorized functions, which have an efficient data frame method. It can also be sensible to use .apply = FALSE
for non-vectorized functions, especially multivariate functions like lm
or pwcor
, or functions renaming the data. See Examples.
with multiple .fns
, .transpose
controls whether the result is ordered first by column, then by function (TRUE
), or vice-versa (FALSE
). "auto"
does the former if all functions yield results of the same dimensions (dimensions may differ if .apply = FALSE
). See Examples.
fsummarise
, fmutate
, Fast Data Manipulation, Collapse Overview
# Basic (Weighted) Summaries
fsummarise(wlddev, across(PCGDP:GINI, fmean, w = POP))
library(magrittr) # Note: Used because |> is not available on older R versions
wlddev %>% fgroup_by(region, income) %>%
fsummarise(across(PCGDP:GINI, fmean, w = POP))
# Note that for these we don't actually need across...
fselect(wlddev, PCGDP:GINI) %>% fmean(w = wlddev$POP, drop = FALSE)
wlddev %>% fgroup_by(region, income) %>%
fselect(PCGDP:GINI, POP) %>% fmean(POP, keep.w = FALSE)
collap(wlddev, PCGDP + LIFEEX + GINI ~ region + income, w = ~ POP, keep.w = FALSE)
# But if we want to use some base R function that reguires argument splitting...
wlddev %>% na_omit(cols = "POP") %>% fgroup_by(region, income) %>%
fsummarise(across(PCGDP:GINI, weighted.mean, w = POP, na.rm = TRUE))
# Or if we want to apply different functions...
wlddev %>% fgroup_by(region, income) %>%
fsummarise(across(PCGDP:GINI, list(mu = fmean, sd = fsd), w = POP),
POP_sum = fsum(POP), OECD = fmean(OECD))
# Note that the above still detects fmean as a fast function, the names of the list
# are irrelevant, but the function name must be typed or passed as a character vector,
# Otherwise functions will be executed by groups e.g. function(x) fmean(x) won't vectorize
# Same, naming in a different way
wlddev %>% fgroup_by(region, income) %>%
fsummarise(across(PCGDP:GINI, list(mu = fmean, sd = fsd), w = POP, .names = "flip"),
sum_POP = fsum(POP), OECD = fmean(OECD))
# Or we want to do more advanced things..
# Such as nesting data frames..
qTBL(wlddev) %>% fgroup_by(region, income) %>%
fsummarise(across(c(PCGDP, LIFEEX, ODA),
function(x) list(Nest = list(x)),
.apply = FALSE))
# Or linear models..
qTBL(wlddev) %>% fgroup_by(region, income) %>%
fsummarise(across(c(PCGDP, LIFEEX, ODA),
function(x) list(Mods = list(lm(PCGDP ~., x))),
.apply = FALSE))
# Or cumputing grouped correlation matrices
qTBL(wlddev) %>% fgroup_by(region, income) %>%
fsummarise(across(c(PCGDP, LIFEEX, ODA),
function(x) qDF(pwcor(x), "Variable"), .apply = FALSE))
# Here calculating 1- and 10-year lags and growth rates of these variables
qTBL(wlddev) %>% fgroup_by(country) %>%
fmutate(across(c(PCGDP, LIFEEX, ODA), list(L, G),
n = c(1, 10), t = year, .names = FALSE))
# Same but variables in different order
qTBL(wlddev) %>% fgroup_by(country) %>%
fmutate(across(c(PCGDP, LIFEEX, ODA), list(L, G), n = c(1, 10),
t = year, .names = FALSE, .transpose = FALSE))
Run the code above in your browser using DataLab