Learn R Programming

iNZightTools (version 1.13.0)

aggregateData: Aggregate data by categorical variables

Description

Aggregate a dataframe into summaries of all numeric variables by grouping them by specified categorical variables and returns the result along with tidyverse code used to generate it.

Usage

aggregateData(
  .data,
  vars,
  summaries,
  summary_vars,
  varnames = NULL,
  quantiles = c(0.25, 0.75),
  custom_funs = NULL
)

Value

aggregated dataframe containing the summaries with tidyverse code attached

Arguments

.data

a dataframe or survey design object to aggregate

vars

a character vector of categorical variables in .data to group by

summaries

summaries to generate for the groups generated in vars. See details.

summary_vars

names of variables in the dataset to calculate summaries of

varnames

name templates for created variables (see details).

quantiles

if requesting quantiles, specify the desired quantiles here

custom_funs

a list of custom functions (see details).

Calculating variable summaries

The aggregateData function accepts any R function which returns a single-value (such as mean, var, sd, sum, IQR). The default name of new variables will be {var}_{fun}, where {var} is the variable name and {fun} is the summary function used. You may pass new names via the varnames argument, which should be either a vector the same length as summary_vars, or a named list (where the names are the summary function). In either case, use {var} to represent the variable name. e.g., {var}_mean or min_{var}.

You can also include the summary missing, which will count the number of missing values in the variable. It has default name {var}_missing.

For the quantile summary, there is the additional argument quantiles. A new variable will be created for each specified quantile 'p'. To name these variables, use {p} in varnames (the default is {var}_q{p}).

Custom functions can be passed via the custom_funs argument. This should be a list, and each element should have a name and either an expr or fun element. Expressions should operate on a variable x. The function should be a function of x and return a single value.

cust_funs <- list(name = '{var}_width', expr = diff(range(x), na.rm = TRUE))
cust_funs <- list(name = '{var}_stderr',
  fun = function(x) {
    s <- sd(x)
    n <- length(x)
    s / sqrt(n)
  }
)

Author

Tom Elliott, Owen Jin

See Also

code

countMissing

Examples

Run this code
aggregated <-
    aggregateData(iris,
        vars = c("Species"),
        summaries = c("mean", "sd", "iqr")
    )
cat(code(aggregated))
head(aggregated)

Run the code above in your browser using DataLab