compare_variables: Compare variables between groups

Description

The analyze function compare_vars() creates a layout element to summarize and compare one or more variables, using the S3 generic function s_summary() to calculate a list of summary statistics. A list of all available statistics for numeric variables can be viewed by running get_stats("analyze_vars_numeric", add_pval = TRUE) and for non-numeric variables by running get_stats("analyze_vars_counts", add_pval = TRUE). Use the .stats parameter to specify the statistics to include in your output summary table.

Prior to using this function in your table layout you must use rtables::split_cols_by() to create a column split on the variable to be used in comparisons, and specify a reference group via the ref_group parameter. Comparisons can be performed for each group (column) against the specified reference group by including the p-value statistic.

Usage

compare_vars(
  lyt,
  vars,
  var_labels = vars,
  na_str = default_na_str(),
  nested = TRUE,
  ...,
  na_rm = TRUE,
  show_labels = "default",
  table_names = vars,
  section_div = NA_character_,
  .stats = c("n", "mean_sd", "count_fraction", "pval"),
  .stat_names = NULL,
  .formats = NULL,
  .labels = NULL,
  .indent_mods = NULL
)
s_compare(x, ...)
# S3 method for numeric
s_compare(x, ...)
# S3 method for factor
s_compare(x, ...)
# S3 method for character
s_compare(x, ...)
# S3 method for logical
s_compare(x, ...)

Value

compare_vars() returns a layout object suitable for passing to further layouting functions, or to rtables::build_table(). Adding this function to an rtable layout will add formatted rows containing the statistics from s_compare() to the table layout.

s_compare() returns output of s_summary() and comparisons versus the reference group in the form of p-values.

Arguments

lyt

(PreDataTableLayouts)
layout that analyses will be added to.

vars

(character)
variable names for the primary analysis variable to be iterated over.

var_labels

(character)
variable labels.

na_str

(string)
string used to replace all NA or empty values in the output.

nested

(flag)
whether this layout instruction should be applied within the existing layout structure _if possible (TRUE, the default) or as a new top-level element (FALSE). Ignored if it would nest a split. underneath analyses, which is not allowed.

...

additional arguments passed to s_compare(), including:

denom: (string) choice of denominator. Options are c("n", "N_col", "N_row"). For factor variables, can only be "n" (number of values in this row and column intersection).
.N_row: (numeric(1)) Row-wise N (row group count) for the group of observations being analyzed (i.e. with no column-based subsetting).
.N_col: (numeric(1)) Column-wise N (column count) for the full column being tabulated within.
verbose: (flag) Whether additional warnings and messages should be printed. Mainly used to print out information about factor casting. Defaults to TRUE. Used for character/factor variables only.

na_rm

(flag)
whether NA values should be removed from x prior to analysis.

show_labels

(string)
label visibility: one of "default", "visible" and "hidden".

table_names

(character)
this can be customized in the case that the same vars are analyzed multiple times, to avoid warnings from rtables.

section_div

(string)
string which should be repeated as a section divider after each group defined by this split instruction, or NA_character_ (the default) for no section divider.

.stats

(character)
statistics to select for the table.

Options for numeric variables are: 'n', 'sum', 'mean', 'sd', 'se', 'mean_sd', 'mean_se', 'mean_ci', 'mean_sei', 'mean_sdi', 'mean_pval', 'median', 'mad', 'median_ci', 'quantiles', 'iqr', 'range', 'min', 'max', 'median_range', 'cv', 'geom_mean', 'geom_sd', 'geom_mean_sd', 'geom_mean_ci', 'geom_cv', 'median_ci_3d', 'mean_ci_3d', 'geom_mean_ci_3d', 'pval'

Options for non-numeric variables are: 'n', 'count', 'count_fraction', 'count_fraction_fixed_dp', 'fraction', 'n_blq', 'pval_counts'

.stat_names

(character)
names of the statistics that are passed directly to name single statistics (.stats). This option is visible when producing rtables::as_result_df() with make_ard = TRUE.

.formats

(named character or list)
formats for the statistics. See Details in analyze_vars for more information on the "auto" setting.

.labels

(named character)
labels for the statistics (without indent).

.indent_mods

(named integer)
indent modifiers for the labels. Each element of the vector should be a name-value pair with name corresponding to a statistic specified in .stats and value the indentation for that statistic's row label.

x

(numeric)
vector of numbers we want to analyze.

Functions

compare_vars(): Layout-creating function which can take statistics function arguments and additional format arguments. This function is a wrapper for rtables::analyze().
s_compare(): S3 generic function to produce a comparison summary.
s_compare(numeric): Method for numeric class. This uses the standard t-test to calculate the p-value.
s_compare(factor): Method for factor class. This uses the chi-squared test to calculate the p-value.
s_compare(character): Method for character class. This makes an automatic conversion to factor (with a warning) and then forwards to the method for factors.
s_compare(logical): Method for logical class. A chi-squared test is used. If missing values are not removed, then they are counted as FALSE.

Examples

Run this code

# `compare_vars()` in `rtables` pipelines

## Default output within a `rtables` pipeline.
lyt <- basic_table() %>%
  split_cols_by("ARMCD", ref_group = "ARM B") %>%
  compare_vars(c("AGE", "SEX"))
build_table(lyt, tern_ex_adsl)

## Select and format statistics output.
lyt <- basic_table() %>%
  split_cols_by("ARMCD", ref_group = "ARM C") %>%
  compare_vars(
    vars = "AGE",
    .stats = c("mean_sd", "pval"),
    .formats = c(mean_sd = "xx.x, xx.x"),
    .labels = c(mean_sd = "Mean, SD")
  )
build_table(lyt, df = tern_ex_adsl)

# `s_compare.numeric`

## Usual case where both this and the reference group vector have more than 1 value.
s_compare(rnorm(10, 5, 1), .ref_group = rnorm(5, -5, 1), .in_ref_col = FALSE)

## If one group has not more than 1 value, then p-value is not calculated.
s_compare(rnorm(10, 5, 1), .ref_group = 1, .in_ref_col = FALSE)

## Empty numeric does not fail, it returns NA-filled items and no p-value.
s_compare(numeric(), .ref_group = numeric(), .in_ref_col = FALSE)

# `s_compare.factor`

## Basic usage:
x <- factor(c("a", "a", "b", "c", "a"))
y <- factor(c("a", "b", "c"))
s_compare(x = x, .ref_group = y, .in_ref_col = FALSE)

## Management of NA values.
x <- explicit_na(factor(c("a", "a", "b", "c", "a", NA, NA)))
y <- explicit_na(factor(c("a", "b", "c", NA)))
s_compare(x = x, .ref_group = y, .in_ref_col = FALSE, na_rm = TRUE)
s_compare(x = x, .ref_group = y, .in_ref_col = FALSE, na_rm = FALSE)

# `s_compare.character`

## Basic usage:
x <- c("a", "a", "b", "c", "a")
y <- c("a", "b", "c")
s_compare(x, .ref_group = y, .in_ref_col = FALSE, .var = "x", verbose = FALSE)

## Note that missing values handling can make a large difference:
x <- c("a", "a", "b", "c", "a", NA)
y <- c("a", "b", "c", rep(NA, 20))
s_compare(x,
  .ref_group = y, .in_ref_col = FALSE,
  .var = "x", verbose = FALSE
)
s_compare(x,
  .ref_group = y, .in_ref_col = FALSE, .var = "x",
  na.rm = FALSE, verbose = FALSE
)

# `s_compare.logical`

## Basic usage:
x <- c(TRUE, FALSE, TRUE, TRUE)
y <- c(FALSE, FALSE, TRUE)
s_compare(x, .ref_group = y, .in_ref_col = FALSE)

## Management of NA values.
x <- c(NA, TRUE, FALSE)
y <- c(NA, NA, NA, NA, FALSE)
s_compare(x, .ref_group = y, .in_ref_col = FALSE, na_rm = TRUE)
s_compare(x, .ref_group = y, .in_ref_col = FALSE, na_rm = FALSE)