table1: Table 1 for Health, Behavioral, and Social Scientists

Description

Produces a descriptive table, stratified by an optional categorical variable, providing means/frequencies and standard deviations/percentages. It is well-formatted for easy transition to academic article or report. Can be used within the piping framework [see library(magrittr)].

Usage

table1(.data, ..., splitby = NULL, splitby_labels = NULL, test = FALSE, test_type = "default", piping = FALSE, rounding = 3, var_names = NULL, format_output = "pvalues", output_type = "text", NAkeep = FALSE, m_label = "Missing", booktabs = TRUE, caption = NULL, align = NULL)

Arguments

.data

the data.frame that is to be summarized

...

variables in the data set that are to be summarized; unquoted names separated by commas (e.g. age, gender, race) or indices. If indices, it needs to be a single vector (e.g. c(1:5, 8, 9:20) instead of 1:5, 8, 9:20). As it is currently, it CANNOT handle both indices and unquoted names simultaneously.

splitby

the categorical variable to stratify by in formula form (e.g., splitby = ~gender); not too surprisingly, it requires that the number of levels be > 0

splitby_labels

allows for custom labels of the splitby levels; must match the number of levels of the splitby variable

test

logical; if set to TRUE then the appropriate bivariate tests of significance are performed if splitby has more than 1 level

test_type

has two options: "default" performs the default tests of significance only; "or" also give unadjusted odds ratios as well based on logistic regression (only use if splitby has 2 levels)

piping

if TRUE then the table is printed and the original data is passed on. It is very useful in piping situations where one wants the table but wants it to be part of a larger pipe.

rounding

the number of digits after the decimal; default is 3

var_names

custom variable names to be printed in the table

format_output

has three options: 1) "full" provides the table with the type of test, test statistic, and the p-value for each variable; 2) "pvalues" provides the table with the p-values; and 3) "stars" provides the table with stars indicating significance

output_type

default is "text"; the other option is "latex" which uses the kable() function in knitr

NAkeep

when sset to TRUE it also shows how many missing values are in the data for each categorical variable being summarized

m_label

when NAkeep = TRUE this provides a label for the missing values in the table

booktabs

when output_type = "latex"; option is passed to knitr::kable

caption

when output_type = "latex"; option is passed to knitr::kable

align

when output_type = "latex"; option is passed to knitr::kable

Value

A table with the number of observations, means/frequencies and standard deviations/percentages is returned. The object is a table1 class object with a print method. Can be printed in LaTex form.

Examples

Run this code

## Data from MASS package ##
library(MASS)
data("birthwt")
library(dplyr)
b = mutate(.data=birthwt,
           smoke = as.factor(smoke),
           race  = as.factor(race),
           ht    = as.factor(ht),
           ui    = as.factor(ui))
levels(b$race) = c("white", "black", "other")

library(furniture)

table1(b, age, race, smoke, ptl, ht, ui, ftv, NAkeep=TRUE)
table1(b, age, race, smoke, ptl, ht, ui, ftv,
       splitby=~factor(low),
       NAkeep=TRUE)
       
b$low = as.factor(b$low)
table1(b, age, race, smoke, ptl, ht, ui, ftv,
       splitby=~low,
       test=TRUE,
       var_names = c("Age", "Race", "Smoking Status", "Previous Premature Labors", "Hypertension",
                     "Uterine Irratibility", "Physician Visits"),
       splitby_labels = c("Regular Birthweight", "Low Birthweight"))

Run the code above in your browser using DataLab