bivariate_compare: Create publication-style table across one categorical variable

Description

Descriptive statistics for categorical variables as well as normally and non-normally distributed continuous variables, split across levels of a categorical variable. Depending on the variable type, an appropriate statistical test is used to assess differences across levels of the comparison variable.

Usage

bivariate_compare(df, compare, normal_vars = NULL,
  non_normal_vars = NULL, cat_vars = NULL, display_round = 2,
  p = TRUE, p_round = 4, include_na = FALSE, col_n = TRUE,
  cont_n = FALSE, all_cont_mean = FALSE, all_cont_median = FALSE,
  iqr = TRUE, fisher = FALSE, workspace = NULL, var_order = NULL,
  var_label_df = NULL)

Value

A data.frame with columns label, overall, a column for each level of compare, and p.value. For normal_vars, mean (SD) is displayed, for non_normal_vars median (IQR) is displayed, and for

cat_vars n (percent) is displayed. For p values on continuous variables, a superscript 'a' denotes the Kruskal-Wallis test was used

Arguments

df: A data.frame or tibble.
compare: Discrete variable. Separate statistics will be produced for each level, with statistical tests across levels. Must be quoted.
normal_vars: Character vector of normally distributed continuous variables that will be included in the descriptive table.
non_normal_vars: Character vector of non-normally distributed continuous variables that will be included in the descriptive table.
cat_vars: Character vector of categorical variables that will be included in the descriptive table.
display_round: Number of decimal places displayed values should be rounded to
p: Logical. Should p-values be calculated and displayed? Default TRUE.
p_round: Number of decimal places p-values should be rounded to.
include_na: Logical. Should NA values be included in the table and accompanying statistical tests? Default FALSE.
col_n: Logical. Should the total number of observations be displayed for each column? Default TRUE.
cont_n: Logical. Display sample n for continuous variables in the table. Default FALSE.
all_cont_mean: Logical. Display mean (sd) for all continuous variables. Default FALSE results in mean (sd) for normally distributed variables and median (IQR) for non-normally distributed variables. Must be FALSE if all_cont_median == TRUE.
all_cont_median: Logical. Display median (sd) for all continuous variables. Default FALSE results in mean (sd) for normally distributed variables and median (IQR) for non-normally distributed variables. Must be FALSE if all_cont_mean == TRUE.
iqr: Logical. If the median is displayed for a continuous variable, should interquartile range be displayed as well (TRUE), or should the values for the 25th and 75th percentiles be displayed (FALSE)? Default TRUE
fisher: Logical. Should Fisher's exact test be used for categorical variables? Default FALSE. Ignored if p == FALSE.
workspace: Numeric variable indicating the workspace to be used for Fisher's exact test. If NULL, the default, the default value of 2e5 is used. Ignored if fisher == FALSE.
var_order: Character vector listing the variable names in the order results should be displayed. If NULL, the default, continuous variables are displayed first, followed by categorical variables.
var_label_df: A data.frame or tibble with columns "variable" and "label" that contains display labels for each variable specified in normal_vars, non_normal_vars, and cat_vars.

Details

Statistical differences between normally distributed continuous variables are assessed using aov(), differences in non-normally distributed variables are assessed using kruskal.test(), and differences in categorical variables are assessed using chisq.test() by default, with a user option for fisher.test() instead.

Examples

Run this code

bivariate_compare(iris, compare = "Species", normal_vars = c("Sepal.Length", "Sepal.Width"))

bivariate_compare(mtcars, compare = "cyl", non_normal_vars = "mpg")

Run the code above in your browser using DataLab