df_stats: Calculate basic statistics on a quantitative variable

Description

Creates a data frame of statistics calculated on one variable, possibly for each group formed by combinations of additional variables. The resulting data frame has one column for each of the statistics requested as well as columns for any grouping variables.

Usage

df_stats(formula, data, ..., drop = TRUE, fargs = list(),
  long_names = TRUE, nice_names = FALSE)

Arguments

formula

A formula indicating which variables are to be used. See details.

data

A data frame or list containing the variables.

...

Functions used to compute the statistics. If this is empty, favstats() is used. Functions used must accept a vector of values and return either a (possibly named) single value, a (possibly named) vector of values, or a data frame with one row.

drop

A logical indicating whether combinations of the grouping variables that do not occur in data should be dropped from the result.

fargs

Arguments passed to the functions in ....

long_names

A logical indciting whether the default names should include the name of the variable being summarized as well as the summarizing function name in the default case when names are not derived from the names of the returned object or an argument name.

nice_names

A logical indicating whether make.names() should be used to force names of the returned data frame to by syntactically valid.

Value

A data frame.

Details

Use a one-sided formula to compute summary statistics for the left hand side expression over the entire data. Use a two-sided formula to compute summary statistics for the left hand expression for each combination of levels of the expressions ocurring on the right hand side. This is most useful when the left hand side is quantitative and each expression on the right hand side has relatively few unique values. A function like ntiles() is often useful to create a few groups of roughly equal size determined by ranges of a quantitative variable. See the examples.

Note that unlike dplyr::summarise(), `df_stats()` ignores any grouping defined in data if data is a grouped tibble.

Names of columns in the resulting data frame are determined as follows. For named arguments in ..., the argument name is used. For unnamed arguments, if the statistic function returns a result with names, those names are used. Else, a name is computed from the expression in ... and the name of the variable being summarized. For functions that produce multiple outputs without names, consecutive integers are appended to the names. See the examples.

Examples

Run this code

# NOT RUN {
df_stats( ~ hp, data = mtcars)
df_stats( ~ hp, data = mtcars, mean, median)
df_stats( hp ~ cyl, data = mtcars, mean, median, range)
# magrittr style piping is also supported
mtcars %>% df_stats(hp ~ cyl)
gf_violin(hp ~ cyl, data = mtcars, group = ~ cyl) %>%
  gf_point(mean_hp ~ cyl, data = df_stats(hp ~ cyl, data = mtcars, mean))

# }

Run the code above in your browser using DataLab