- data
A data.frame.
- select
Columns to include. If regex = FALSE, use tidyselect
syntax or a character vector of column names (default:
tidyselect::everything()). If regex = TRUE, provide a regular
expression pattern (character string).
- by
Optional grouping column. Accepts an unquoted column name or
a single character column name. The column does not need to be
numeric.
- exclude
Columns to exclude. Supports tidyselect syntax and
character vectors of column names.
- regex
Logical. If FALSE (the default), uses tidyselect
helpers. If TRUE, the select argument is treated as a regular
expression.
- test
Character. Statistical test to use when comparing groups.
One of "welch" (default), "student", or "nonparametric".
"welch": Welch t-test (2 groups) or Welch one-way ANOVA
(3+ groups). Does not assume equal variances.
"student": Student t-test (2 groups) or classic one-way
ANOVA (3+ groups). Assumes equal variances.
"nonparametric": Wilcoxon rank-sum / Mann--Whitney U
(2 groups) or Kruskal--Wallis H (3+ groups).
Used whenever by is supplied (since p_value defaults to TRUE
in that case) or when statistic = TRUE / effect_size = TRUE.
Ignored when by is not used, or when all three display toggles
are turned off.
- p_value
Logical or NULL. If TRUE and by is used, adds a
p-value column from the test specified by test. When NULL (the
default), the p-value is shown automatically whenever by is
supplied, and hidden otherwise. Pass p_value = FALSE to suppress
the column explicitly. Ignored when by is not used.
- statistic
Logical. If TRUE and by is used, the test
statistic is shown in an additional column (e.g.,
t(df) = ..., F(df1, df2) = ..., W = ..., or H(df) = ...).
Both p_value and statistic are independent; either or both
can be enabled. Defaults to FALSE. Ignored when by is not
used.
- show_n
Logical. If TRUE, includes an unweighted n
column in the printed ASCII table and in every rendered output
(tinytable, gt, flextable, word, excel, clipboard).
Set to FALSE to drop the n column structurally from those
outputs (no empty placeholder, no spanner). The n column is
always present in the raw output = "data.frame" /
"long" for downstream programmatic access. Defaults to TRUE.
- effect_size
Effect-size measure to include in the rendered
outputs. One of:
"none" (default): no effect-size column.
"auto": auto-select the canonical measure for the chosen
test and group count -- Hedges' g (parametric, 2 groups),
eta-squared (parametric, 3+ groups), rank-biserial r
(nonparametric, 2 groups), epsilon-squared (nonparametric, 3+
groups). This is the historical behaviour of effect_size = TRUE.
"hedges_g": Hedges' g (bias-corrected standardised mean
difference, 2 groups, parametric). CI via the Hedges & Olkin
normal approximation.
"eta_sq": Eta-squared (\(\eta^2\), parametric ANOVA-style
SS_between / SS_total). CI via inversion of the noncentral
F distribution.
"r_rb": Rank-biserial r from the Wilcoxon / Mann-Whitney
statistic (2 groups, nonparametric). CI via Fisher
z-transform.
"epsilon_sq": Epsilon-squared (\(\varepsilon^2\)) from the
Kruskal-Wallis statistic (3+ groups, nonparametric). CI via
percentile bootstrap (2 000 replicates).
For backward compatibility, effect_size = TRUE is silently
coerced to "auto" and effect_size = FALSE to "none".
Explicit choices are validated against the active test and the
number of groups; an incompatible request (e.g. "eta_sq" with
two groups, or "hedges_g" with test = "nonparametric")
triggers an actionable error. Ignored when by is not used.
- effect_size_ci
Logical. If TRUE, appends the confidence
interval of the effect size in brackets (e.g.,
g = 0.45 [0.22, 0.68]). Implies a non-"none" effect size; if
effect_size = "none" is left unchanged, this argument is
ignored with a warning, and the function falls back to
effect_size = "auto". Defaults to FALSE.
- ci
Logical. If TRUE, includes the mean confidence
interval columns (<level>% CI LL / <level>% CI UL) and their
spanner in the printed ASCII table and in every rendered output
(tinytable, gt, flextable, word, excel, clipboard).
Set to FALSE to drop both columns and the CI spanner
structurally from those outputs (no empty placeholders, no
border lines under an empty header). The CI bounds are always
present as ci_lower / ci_upper in the raw
output = "data.frame" / "long" for downstream programmatic
access. Defaults to TRUE. The CI level is taken from ci_level.
- labels
An optional named character vector of variable labels.
Names must match column names in data. When NULL (the default),
labels are auto-detected from variable attributes (e.g., haven
labels); if none are found, the column name is used.
- ci_level
Confidence level for the mean confidence interval
(default: 0.95). Must be between 0 and 1 exclusive.
- digits
Number of decimal places for descriptive values and test
statistics (default: 2).
- effect_size_digits
Number of decimal places for effect-size values
in formatted displays (default: 2).
- p_digits
Integer >= 1. Number of decimal places used to
render p-values in the p column (default: 3, the APA
Publication Manual standard). Both the displayed precision and
the small-p threshold derive from this argument: p_digits = 3
prints .045 and <.001; p_digits = 4 prints .0451 and
<.0001; p_digits = 2 prints .05 and <.01. Useful for
genomics / GWAS contexts with very small p-values, or for
journals using a coarser convention. Leading zeros are always
stripped, following APA convention.
- decimal_mark
Character used as decimal separator.
Either "." (default) or ",".
- align
Horizontal alignment of numeric columns in the printed
ASCII table and in the tinytable, gt, flextable, word,
and clipboard outputs. The first column (Variable) and
Group (when present) are always left-aligned. One of:
"decimal" (default): align numeric columns on the decimal
mark, the standard scientific-publication convention used by
SPSS, SAS, LaTeX siunitx, gt::cols_align_decimal() and
tinytable::style_tt(align = "d"). For engines without a
native decimal-alignment primitive (flextable, word,
clipboard, ASCII print), values are pre-padded with leading
and trailing spaces so the dots line up vertically; the body
of the flextable/word output additionally uses a monospace
font to make character widths uniform.
"center": center-align all numeric columns.
"right": right-align all numeric columns.
"auto": legacy per-column rule (center for the descriptive
columns, right for n and p).
The excel output uses the engine's default alignment in any
case: cell-string padding does not align decimals under
proportional fonts. Same default and semantics as
table_continuous_lm().
- output
Output format. One of:
"default": a printed ASCII table, returned invisibly.
"data.frame" / "long": a plain data.frame with one row
per (variable x group) (or one row per variable when by
is not used). The two names are synonyms; pick whichever reads
better in your pipeline ("long" matches
table_continuous_lm()'s naming).
"tinytable" (requires tinytable)
"gt" (requires gt)
"flextable" (requires flextable)
"excel" (requires openxlsx2)
"clipboard" (requires clipr)
"word" (requires flextable and officer)
- excel_path
File path for output = "excel".
- excel_sheet
Sheet name for output = "excel"
(default: "Descriptives").
- clipboard_delim
Delimiter for output = "clipboard"
(default: "\t").
- word_path
File path for output = "word".
- verbose
Logical. If TRUE, prints messages about excluded
non-numeric columns (default: FALSE).