tablet.data.frame: Generate a Tablet for Data Frame

Description

Generates a 'tablet': a summary table of formatted statistics for factors (is.factor()) and numerics (is.numeric()) in x, with and without grouping variables (if present, see group_by). Column names represent finest level of grouping, distinguished by attribute 'nest' (the values of higher other groups, if any) along with the 'all' column for ungrouped statistics. Column attribute 'n' indicates relevant corresponding observation count. Input should not have column names beginning with '_tablet'.

Usage

# S3 method for data.frame
tablet(
 x,
 ...,
 na.rm = FALSE,
 all = 'All',
 fun = list(
  sum ~ signif(digits = 3,     sum(x,  na.rm = TRUE)),
  pct ~ signif(digits = 3,     sum / n * 100        ),
  ave ~ signif(digits = 3,    mean(x,  na.rm = TRUE)),
  std ~ signif(digits = 3,      sd(x,  na.rm = TRUE)),
  med ~ signif(digits = 3,  median(x,  na.rm = TRUE)),
  min ~ signif(digits = 3,     min(x,  na.rm = TRUE)),
  max ~ signif(digits = 3,     max(x,  na.rm = TRUE))
 ),
 fac = list(
  ` ` ~ sum + ' (' + pct + '%' + ')'
 ),
 num = list(
  `Mean (SD)` ~ ave + ' (' + std + ')',
  `Median (range)` ~ med + ' (' + min + ', ' + max + ')'
  ),
 lab = list(
  lab ~ name + '\n(N = ' + n + ')'
 ),
 na.rm_fac = na.rm,
 na.rm_num = na.rm,
 exclude_fac = NULL,
 exclude_name = NULL
)

Arguments

data.frame (possibly grouped)

...

substitute formulas for elements of fun, fac, num, lab

na.rm

whether to remove NA in general

all

a column name for ungrouped statistics; can have length zero to suppress ungrouped column

fun

default aggregate functions expressed as formulas

fac

a list of formulas to generate widgets for factors

num

a list of formulas to generate widgets for numerics

lab

a list of formulas to generate label attributes for columns (see details)

na.rm_fac

whether to drop NA 'factor' observations; passed to gather as na.rm, interacts with exclude_fac

na.rm_num

whether to drop NA numeric observations; passed to gather as na.rm

exclude_fac

which factor levels to exclude; see factor (exclude)

exclude_name

whether to drop NA values of column name (for completeness); passed to gather

Value

'tablet', with columns for each combination of groups, and:

_tablet_name

observation identifier

_tablet_level

factor level (or special value 'numeric' for numerics)

_tablet_stat

the LHS of formulas in 'fac' and 'num'

All (or value of 'all' argument)

ungrouped results

_tablet_sort

sorting column

Details

Arguments 'fun', 'fac', 'num', and 'lab' are lists of two-sided formulas that are evaluated in an environment where '+' expresses concatenation (for character elements). The values of LHS should be unique across all four lists. 'fun' is a list of aggregate statistics that have access to N (number of original records), n (number of group members), and x (the numeric observations, or 1 for each factor value). Aggregate statistics generated by 'fun' are available for use in 'fac' and 'num' which create visualizations thereof ('widgets'). Column-specific attributes are available to elements of 'lab', including the special attribute name (the current column name). For 'lab' only, if the RHS succeeds, it becomes the label attribute of the corresponding output column. 'lab' is used here principally to support annotation of *output* columns; if *input* columns have attributes 'label' or 'title' (highest priority) those will have been already substituted for default column names at the appropriate positions in the output.

Missingness of observations (and to a lesser extent, levels of grouping variables) merits special consideration. Be aware that na.rm_fac and na.rm_num take their defaults from na.rm. Furthermore, na.rm_fac may interact with exclude_fac, which is passed to factor as exclude. The goal is to support all possible ways of expressing or ignoring missingness. That said, if aggregate functions are removing NA, the values of arguments beginning with 'na.rm' or 'exclude' may not matter.

Examples

Run this code

# NOT RUN {
library(boot)
library(dplyr)
library(magrittr)
melanoma %>%
  select(-time, -year) %>%
  mutate(sex = factor(sex), ulcer = factor(ulcer)) %>%
  group_by(status) %>%
  tablet
# }