Aggregate a dataframe into summaries of all numeric variables by grouping them by specified categorical variables and returns the result along with tidyverse code used to generate it.
aggregateData(
.data,
vars,
summaries,
summary_vars,
varnames = NULL,
quantiles = c(0.25, 0.75),
custom_funs = NULL
)
aggregated dataframe containing the summaries with tidyverse code attached
a dataframe or survey design object to aggregate
a character vector of categorical variables in .data
to group by
summaries to generate for the groups generated
in vars
. See details.
names of variables in the dataset to calculate summaries of
name templates for created variables (see details).
if requesting quantiles, specify the desired quantiles here
a list of custom functions (see details).
The aggregateData
function accepts any R function which returns a single-value (such as mean
, var
, sd
, sum
, IQR
). The default name of new variables will be {var}_{fun}
, where {var}
is the variable name and {fun}
is the summary function used. You may pass new names via the varnames
argument, which should be either a vector the same length as summary_vars
, or a named list (where the names are the summary function). In either case, use {var}
to represent the variable name. e.g., {var}_mean
or min_{var}
.
You can also include the summary missing
, which will count the number of missing values in the variable. It has default name {var}_missing
.
For the quantile
summary, there is the additional argument quantiles
. A new variable will be created for each specified quantile 'p'. To name these variables, use {p}
in varnames
(the default is {var}_q{p}
).
Custom functions can be passed via the custom_funs
argument. This should be a list, and each element should have a name
and either an expr
or fun
element. Expressions should operate on a variable x
. The function should be a function of x
and return a single value.
cust_funs <- list(name = '{var}_width', expr = diff(range(x), na.rm = TRUE))
cust_funs <- list(name = '{var}_stderr',
fun = function(x) {
s <- sd(x)
n <- length(x)
s / sqrt(n)
}
)
Tom Elliott, Owen Jin
code
countMissing
aggregated <-
aggregateData(iris,
vars = c("Species"),
summaries = c("mean", "sd", "iqr")
)
cat(code(aggregated))
head(aggregated)
Run the code above in your browser using DataLab