univariate_table: Create a custom univariate summary for a dataset

Description

Produces a formatted table of univariate summary statistics with options allowing for stratification by 1 or more variables, computing of custom summary/association statistics, custom string templates for results, etc.

Usage

univariate_table(
    data,
    strata = NULL,
    associations = NULL,
    numeric_summary = c(Summary = "median (iqr)"),
    categorical_summary = c(Summary = "count (percent%)"),
    other_summary = c(Summary = "unique"),
    all_summary = NULL,
    evaluate = FALSE,
    add_n = FALSE,
    order = NULL,
    labels = NULL,
    levels = NULL,
    format = c("html", "latex", "markdown", "pandoc", "none"),
    variableName = "Variable",
    levelName = "Level",
    na_string = "(missing)",
    strata_sep = "/",
    summary_strata_sep = "_",
    fill_blanks = "",
    caption = "",
    ...
)

Arguments

data

A data.frame to summarise.

strata

A formula specifying one or more stratification variables. LHS variables go to rows, RHS variables go to columns. Defaults to NULL.

associations

A named list of functions to evaluate with column strata and each variable. Defaults to NULL.

numeric_summary

A (preferably named) character vector containing string templates of how results for numeric data should be presented. See details for a list of what is available by default. Defaults to c(Summary = "median (iqr)").

categorical_summary

A (preferably named) character vector containing string templates of how results for categorical data should be presented. See details for a list of what is available by default. Defaults to c(Summary = "count (percent%)").

other_summary

A (preferably named) character vector containing string templates of how results for non-numeric and non-categorical data should be presented. See details for a list of what is available by default. Defaults to c(Summary = "unique").

all_summary

A (preferably named) character vector containing string templates of additional results for all variables should be presented. See details for a list of what is available by default. Defaults to NULL.

evaluate

Should the results of the string templates be evaluated as an R expression after filled with their values? See "absorb" for details. Defaults to FALSE.

add_n

Should the sample size for each stratfication level be added to the result? Defaults to FALSE.

order

Character vector of 1 or more variables to reorder the result by from top to bottom. If NULL (default), the result is sorted according to names(data).

labels

Named character vector for re-labeling variables in the result. Defaults to NULL.

levels

Named list of character vectors for re-labeling factor levels in the result. Defaults to NULL.

format

The format that the result should be rendered as. Must be one of c("html", "latex", "markdown", "pandoc", "none"). Defaults to "html".

variableName

Header for the variable column in the result. Defaults to "Variable".

levelName

Header for the factor level column in the result. Defaults to "Level".

na_string

String for NA factor levels in the result. Defaults to "(missing)".

strata_sep

Delimiter to separate stratification levels by in the result. Defaults to "/".

summary_strata_sep

Delimiter to separate summary column names with the strata groups. Defaults to "_".

fill_blanks

String to fill in blank spaces in the result. Defaults to "".

caption

Caption for resulting table passed to knitr::kable. Defaults to NULL.

...

Additional arguments to pass to "descriptives".

Value

A table of summary statistics according to the specified format. A tibble is returned if format = "none".

Details

The following statistics are available by default for each data type:

Numeric: "min", "max", "median", "iqr", "mean", "sd"

Categorical: "count", "percent"

All variables: "length", "missing", "available", "class", "unique"

These strings are typed explicitly in the ._summary arguments and serve as placeholders for where the actual value will appear. Custom functions can be entered in a named list, where the names are what provide access to the values in string templates. See "descriptives" and "absorb".

The names of the ._summary arguments are what become the column headers in the result. If unnamed, an arbitrary name (i.e. "VX") will appear in the column header.

Examples

Run this code

# NOT RUN {
require(tidyverse)
    
#1) Default summary
heart_disease %>%
    univariate_table()

#2) Stratified summary
heart_disease %>%
    univariate_table(
        strata = ~Sex,
        add_n = TRUE
    )

#See vignette("cheese") for more examples
    
# }

Run the code above in your browser using DataLab