Descriptive statistics for categorical variables as well as normally and non-normally distributed continuous variables, split across levels of a categorical variable. Depending on the variable type, an appropriate statistical test is used to assess differences across levels of the comparison variable.
bivariate_compare(df, compare, normal_vars = NULL,
non_normal_vars = NULL, cat_vars = NULL, display_round = 2,
p = TRUE, p_round = 4, include_na = FALSE, col_n = TRUE,
cont_n = FALSE, all_cont_mean = FALSE, all_cont_median = FALSE,
iqr = TRUE, fisher = FALSE, workspace = NULL, var_order = NULL,
var_label_df = NULL)
A data.frame with columns label, overall, a column for each level
of compare
, and p.value. For normal_vars
, mean (SD) is
displayed, for non_normal_vars
median (IQR) is displayed, and for
cat_vars
n (percent) is displayed. For p values on continuous
variables, a superscript 'a' denotes the Kruskal-Wallis test was used
A data.frame or tibble.
Discrete variable. Separate statistics will be produced for each level, with statistical tests across levels. Must be quoted.
Character vector of normally distributed continuous variables that will be included in the descriptive table.
Character vector of non-normally distributed continuous variables that will be included in the descriptive table.
Character vector of categorical variables that will be included in the descriptive table.
Number of decimal places displayed values should be rounded to
Logical. Should p-values be calculated and displayed?
Default TRUE
.
Number of decimal places p-values should be rounded to.
Logical. Should NA
values be included in the
table and accompanying statistical tests? Default FALSE
.
Logical. Should the total number of observations be displayed
for each column? Default TRUE
.
Logical. Display sample n for continuous variables in the
table. Default FALSE
.
Logical. Display mean (sd) for all continuous variables.
Default FALSE
results in mean (sd) for normally distributed variables
and median (IQR) for non-normally distributed variables. Must be
FALSE
if all_cont_median == TRUE
.
Logical. Display median (sd) for all continuous variables.
Default FALSE
results in mean (sd) for normally distributed variables
and median (IQR) for non-normally distributed variables. Must be
FALSE
if all_cont_mean == TRUE
.
Logical. If the median is displayed for a continuous variable, should
interquartile range be displayed as well (TRUE
), or should the values
for the 25th and 75th percentiles be displayed (FALSE
)? Default
TRUE
Logical. Should Fisher's exact test be used for categorical
variables? Default FALSE
. Ignored if p == FALSE
.
Numeric variable indicating the workspace to be used for
Fisher's exact test. If NULL
, the default, the default value of
2e5
is used. Ignored if fisher == FALSE
.
Character vector listing the variable names in the order
results should be displayed. If NULL
, the default, continuous
variables are displayed first, followed by categorical variables.
A data.frame or tibble with columns "variable" and
"label" that contains display labels for each variable specified in
normal_vars
, non_normal_vars
, and cat_vars
.
Statistical differences between normally distributed continuous variables
are assessed using aov()
, differences in non-normally distributed
variables are assessed using kruskal.test()
, and differences in
categorical variables are assessed using chisq.test()
by default,
with a user option for fisher.test()
instead.
bivariate_compare(iris, compare = "Species", normal_vars = c("Sepal.Length", "Sepal.Width"))
bivariate_compare(mtcars, compare = "cyl", non_normal_vars = "mpg")
Run the code above in your browser using DataLab