summary_factorlist: Summarise a set of factors (or continuous variables) by a dependent variable

Description

A function that takes a single dependent variable with a vector of explanatory variable names (continuous or categorical variables) to produce a summary table.

Usage

summary_factorlist(.data, dependent = NULL, explanatory, cont = "mean",
  cont_cut = 5, p = FALSE, na_include = FALSE, column = FALSE,
  total_col = FALSE, orderbytotal = FALSE, fit_id = FALSE,
  na_to_missing = TRUE, add_dependent_label = FALSE,
  dependent_label_prefix = "Dependent: ", dependent_label_suffix = "")

Arguments

.data

Dataframe.

dependent

Character vector of length 1: name of dependent variable (2 to 5 factor levels).

explanatory

Character vector of any length: name(s) of explanatory variables.

cont

Summary for continuous variables: "mean" (standard deviation) or "median" (interquartile range).

cont_cut

Numeric: number of unique values in continuous variable at which to consider it a factor.

Logical: Include statistical test (see summary.formula).

na_include

Logical: include missing data in summary (NA).

column

Logical: Compute margins by column rather than row.

total_col

Logical: include a total column summing across factor levels.

orderbytotal

Logical: order final table by total column high to low.

fit_id

Logical: not used directly, allows merging via finalfit_merge.

na_to_missing

Logical: convert NA to 'Missing' when na_include=TRUE.

add_dependent_label

Add the name of the dependent label to the top left of table

dependent_label_prefix

Add text before dependent label

dependent_label_suffix

Add text after dependent label

Value

Returns a factorlist dataframe.

Details

This function is mostly a wrapper for Hmisc:::summary.formula(..., method = "reverse") but produces a publication-ready table the way we like them. It usually takes a categorical dependent variable (with two to five levels) to produce a cross table of counts and proportions expressed as percentages. However, it will take a continuous dependent variable to produce mean (standard deviation) or median (interquartile range) for use with linear regression models.

Examples

Run this code

# NOT RUN {
library(finalfit)
library(dplyr)
# Load example dataset, modified version of survival::colon
data(colon_s)

# Table 1 - Patient demographics ----
explanatory = c("age", "age.factor", "sex.factor", "obstruct.factor")
dependent = "perfor.factor"
colon_s %>%
	summary_factorlist(dependent, explanatory, p=TRUE)

# summary.factorlist() is also commonly used to summarise any number of
# variables by an outcome variable (say dead yes/no).

# Table 2 - 5 yr mortality ----
explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor")
dependent = "mort_5yr"
colon_s %>%
	summary_factorlist(dependent, explanatory)
# }