table_categorical: Categorical summary table

Description

Builds a publication-ready frequency or cross-tabulation table for one or many categorical variables selected with tidyselect syntax.

With by, produces grouped cross-tabulation summaries (using cross_tab() internally) with Chi-squared p-values and optional association measures. Without by, produces one-way frequency-style summaries.

Multiple output formats are available via output: a printed ASCII table ("default"), a wide or long numeric data.frame ("data.frame", "long"), or publication-ready tables ("tinytable", "gt", "flextable", "excel", "clipboard", "word").

Usage

table_categorical(
  data,
  select,
  by = NULL,
  labels = NULL,
  levels_keep = NULL,
  include_total = TRUE,
  drop_na = TRUE,
  weights = NULL,
  rescale = FALSE,
  correct = FALSE,
  simulate_p = FALSE,
  simulate_B = 2000,
  percent_digits = 1,
  p_digits = 3,
  v_digits = 2,
  assoc_measure = "auto",
  assoc_ci = FALSE,
  decimal_mark = ".",
  align = c("decimal", "auto", "center", "right"),
  output = c("default", "data.frame", "long", "tinytable", "gt", "flextable", "excel",
    "clipboard", "word"),
  indent_text = "  ",
  indent_text_excel_clipboard = strrep(" ", 6),
  add_multilevel_header = TRUE,
  blank_na_wide = FALSE,
  excel_path = NULL,
  excel_sheet = "Categorical",
  clipboard_delim = "\t",
  word_path = NULL
)

Value

Depends on output:

"default": prints a styled ASCII table and returns the underlying data.frame invisibly (S3 class "spicy_categorical_table").
"data.frame": a wide data.frame with one row per variable--level combination. When by is used, the columns are Variable, Level, and one pair of n / \% columns per group level (plus Total when include_total = TRUE), followed by Chi2, df, p, and the association measure column. When by = NULL, the columns are Variable, Level, n, \%.
"long": a long data.frame with columns variable, level, group, n, percent (and chi2, df, p, association measure columns when by is used).
"tinytable": a tinytable object.
"gt": a gt_tbl object.
"flextable": a flextable object.
"excel" / "word": writes to disk and returns the file path invisibly.
"clipboard": copies the table and returns the display data.frame invisibly.

Arguments

data

A data frame.

select

Columns to include as row variables. Supports tidyselect syntax and character vectors of column names.

by

Optional grouping column used for columns/groups. Accepts an unquoted column name or a single character column name.

labels

Optional display labels for the variables. Two forms are accepted (matching table_continuous() and table_continuous_lm()):

A named character vector whose names match column names in data (e.g. c(bmi = "Body mass index")); only listed columns are relabelled, others fall back to attribute-based labels or the column name. Recommended form.
A positional character vector of the same length as select, in the same order. Backward-compatible with the spicy < 0.11.0 API.

When NULL (the default), column names are used as-is. If a variable label attribute is present (e.g. from haven), it is not picked up here -- pass labels = c(...) explicitly. (The continuous companions auto-detect attribute labels; the categorical function is conservative because the indented row labels expect predictable text.)

levels_keep

Optional character vector of levels to keep/order for row modalities. If NULL, all observed levels are kept.

include_total

Logical. If TRUE (the default), includes a Total group when available.

drop_na

Logical. If TRUE (the default), removes rows with NA in the row/group variable before each cross-tabulation. If FALSE, missing values are displayed as a dedicated "(Missing)" level.

weights

Optional weights. Either NULL (the default), a numeric vector of length nrow(data), or a single column in data supplied as an unquoted name or a character string.

rescale

Logical. If FALSE (the default), weights are used as-is. If TRUE, rescales weights so total weighted N matches raw N. Passed to spicy::cross_tab().

correct

Logical. If FALSE (the default), no continuity correction is applied. If TRUE, applies Yates correction in 2x2 chi-squared contexts. Passed to spicy::cross_tab().

simulate_p

Logical. If FALSE (the default), uses asymptotic p-values. If TRUE, uses Monte Carlo simulation. Passed to spicy::cross_tab().

simulate_B

Integer. Number of Monte Carlo replicates when simulate_p = TRUE. Defaults to 2000.

percent_digits

Number of digits for percentages in report outputs. Defaults to 1.

p_digits

Number of digits for p-values (except < .001). Defaults to 3.

v_digits

Number of digits for the association measure. Defaults to 2.

assoc_measure

Which association measure to report alongside the chi-squared p-value. Accepts four input shapes:

"none" — drop the column entirely.
"auto" (the default) — pick a measure per row variable based on the variable type: a 2x2 table (binary row variable vs. binary by) uses phi, a pair of ordered factors uses tau_b, every other case uses cramer_v.
a single string from c("cramer_v", "phi", "gamma", "tau_b", "tau_c", "somers_d", "lambda") — applied uniformly to every row variable.
a character vector with one entry per row variable. Both named (c(smoking = "phi", health = "tau_b"), recommended; unnamed variables fall back to "auto") and unnamed positional (c("phi", "tau_b", "auto"), paired up with select) are accepted. Named is more robust to reordering of select.

When a single measure is used for every row, the column header is that measure's name (e.g. "Cramer's V"). When multiple measures are used (typically with "auto" on a heterogeneous select), the header collapses to "Effect size" and an APA-style Note. line is appended documenting which measure was used for which variable.

phi requires a 2x2 table; if explicitly requested for a non-2x2 variable, an error is raised so the user can choose another measure or fall back to "auto".

assoc_ci

Passed to cross_tab(). If TRUE, includes the confidence interval of the association measure. In wide raw outputs ("data.frame", "excel", "clipboard"), two extra columns CI lower / CI upper are added; in the long raw output ("long") the bounds appear as ci_lower / ci_upper. In rendered formats ("gt", "tinytable", "flextable", "word"), the CI is shown inline (e.g., .14 [.08, .19]). Defaults to FALSE.

decimal_mark

Decimal separator ("." or ","). Defaults to ".".

align

Horizontal alignment of numeric columns in the printed ASCII table and in the tinytable, gt, flextable, word, and clipboard outputs. The first column (Variable) is always left-aligned. One of:

"decimal" (default): align numeric columns on the decimal mark, the standard scientific-publication convention used by SPSS, SAS, LaTeX siunitx, and the native primitives of gt::cols_align_decimal() and tinytable::style_tt(align = "d"). For engines without a native primitive (flextable, word, clipboard, ASCII print), numeric cells are pre-padded with leading and trailing spaces so the dots line up vertically; the body of the flextable/word output additionally uses a monospace font (Consolas) to make character widths uniform.
"center": center-align all numeric columns.
"right": right-align all numeric columns.
"auto": legacy uniform right-alignment used in spicy < 0.11.0.

The excel output uses the engine's default alignment in any case: cell-string padding does not align decimals under proportional fonts, and Excel's native right-alignment combined with the per-column numfmt already produces dot-aligned columns. Same default and semantics as table_continuous() / table_continuous_lm().

output

Output format. One of:

"default" (a printed ASCII table, returned invisibly)
"data.frame" (a wide numeric data.frame)
"long" (a long numeric data.frame)
"tinytable" (requires tinytable)
"gt" (requires gt)
"flextable" (requires flextable)
"excel" (requires openxlsx2)
"clipboard" (requires clipr)
"word" (requires flextable and officer)

indent_text

Prefix used for modality labels in report table building. Defaults to " " (two spaces).

indent_text_excel_clipboard

Stronger indentation used in Excel and clipboard exports. Defaults to six non-breaking spaces.

add_multilevel_header

Logical. If TRUE (the default), merges top headers in Excel export.

blank_na_wide

Logical. If FALSE (the default), NA values are kept as-is in wide raw output. If TRUE, replaces them with empty strings.

excel_path

Path for output = "excel". Defaults to NULL.

excel_sheet

Sheet name for Excel export. Defaults to "Categorical".

clipboard_delim

Delimiter for clipboard text export. Defaults to "\t".

word_path

Path for output = "word" or optional save path when output = "flextable". Defaults to NULL.

Tests

When by is used, each selected variable is cross-tabulated against the grouping variable with cross_tab(). The omnibus chi-squared test (with optional Yates continuity correction or Monte Carlo p-value, see correct / simulate_p) is computed and reported in the p column. The chosen association measure (assoc_measure, with "auto" selecting Cramer's V for nominal variables and Kendall's Tau-b when both are ordered) is reported alongside, with optional CI via assoc_ci. Without by, the table reports the marginal frequency distribution of each variable with no inferential statistics.

For model-based comparisons (cluster-robust SE, weighted contrasts, fitted means) on continuous outcomes, see table_continuous_lm(). For descriptive (empirical) comparisons on continuous outcomes, see table_continuous().

Display conventions

By default (align = "decimal") numeric columns are aligned on the decimal mark, the standard scientific-publication convention used by SPSS, SAS, LaTeX siunitx, and the native primitives of gt::cols_align_decimal() / tinytable::style_tt(align = "d"). For the printed ASCII table the alignment is achieved by padding numeric cells with leading and trailing spaces so dots line up vertically. Pass align = "auto" to revert to the legacy uniform right-alignment used in spicy < 0.11.0.

p-values are formatted with p_digits decimal places (default 3, the APA standard). Leading zeros on p are always stripped (.045, not 0.045).

Optional output engines require suggested packages:

tinytable for output = "tinytable"
gt for output = "gt"
flextable for output = "flextable"
flextable + officer for output = "word"
openxlsx2 for output = "excel"
clipr for output = "clipboard"

Examples

Run this code

# --- Basic usage ---------------------------------------------------------

# Default: ASCII console table grouped by sex.
table_categorical(
  sochealth,
  select = c(smoking, physical_activity),
  by = sex
)

# One-way frequency-style table (no `by`).
table_categorical(
  sochealth,
  select = c(smoking, physical_activity)
)

# Pretty labels keyed by column name.
table_categorical(
  sochealth,
  select = c(smoking, physical_activity),
  by = education,
  labels = c(
    smoking           = "Current smoker",
    physical_activity = "Physical activity"
  )
)

# Survey weights with rescaling.
table_categorical(
  sochealth,
  select = c(smoking, physical_activity),
  by = education,
  weights = "weight",
  rescale = TRUE
)

# Confidence interval for the association measure.
table_categorical(
  sochealth,
  select = smoking,
  by = education,
  assoc_ci = TRUE
)

# --- Per-variable association measure ----------------------------------

# Default (`assoc_measure = "auto"`): one measure per row variable based on
# the variable type (2x2 -> Phi, both ordered factors -> Kendall's Tau-b,
# otherwise Cramer's V). When the chosen measures differ across rows, the
# column header collapses to `"Effect size"` and an APA-style `Note.` line
# documents which measure was used for which variable.
table_categorical(
  sochealth,
  select = c(smoking, education),
  by = sex
)

# Force a uniform measure across all row variables.
table_categorical(
  sochealth,
  select = c(smoking, education),
  by = sex,
  assoc_measure = "cramer_v"
)

# Per-variable override (recommended named form).
table_categorical(
  sochealth,
  select = c(smoking, education, self_rated_health),
  by = sex,
  assoc_measure = c(
    smoking           = "phi",        # binary x binary
    education         = "cramer_v",   # multi-category nominal
    self_rated_health = "tau_b"       # ordinal x binary, Tau-b
  )
)

# --- Output formats -----------------------------------------------------

# The rendered outputs below all wrap the same call:
#   table_categorical(sochealth,
#                     select = c(smoking, physical_activity),
#                     by = sex)
# only `output` changes. Assign to a variable to avoid the
# console-friendly text fallback that some engines fall back to
# when printed directly in `?` help.

# Wide data.frame (one row per modality).
table_categorical(
  sochealth,
  select = c(smoking, physical_activity),
  by = sex,
  output = "data.frame"
)

# Long data.frame (one row per (modality x group)).
table_categorical(
  sochealth,
  select = c(smoking, physical_activity),
  by = sex,
  output = "long"
)

# \donttest{
# Rendered HTML / docx objects -- best viewed inside a
# Quarto / R Markdown document or a pkgdown article.
if (requireNamespace("tinytable", quietly = TRUE)) {
  tt <- table_categorical(
    sochealth, select = c(smoking, physical_activity), by = sex,
    output = "tinytable"
  )
}
if (requireNamespace("gt", quietly = TRUE)) {
  tbl <- table_categorical(
    sochealth, select = c(smoking, physical_activity), by = sex,
    output = "gt"
  )
}
if (requireNamespace("flextable", quietly = TRUE)) {
  ft <- table_categorical(
    sochealth, select = c(smoking, physical_activity), by = sex,
    output = "flextable"
  )
}

# Excel and Word: write to a temporary file.
if (requireNamespace("openxlsx2", quietly = TRUE)) {
  tmp <- tempfile(fileext = ".xlsx")
  table_categorical(
    sochealth, select = c(smoking, physical_activity), by = sex,
    output = "excel", excel_path = tmp
  )
  unlink(tmp)
}
if (
  requireNamespace("flextable", quietly = TRUE) &&
    requireNamespace("officer", quietly = TRUE)
) {
  tmp <- tempfile(fileext = ".docx")
  table_categorical(
    sochealth, select = c(smoking, physical_activity), by = sex,
    output = "word", word_path = tmp
  )
  unlink(tmp)
}
# }

if (FALSE) {
# Clipboard: writes to the system clipboard.
table_categorical(
  sochealth, select = c(smoking, physical_activity), by = sex,
  output = "clipboard"
)
}

Run the code above in your browser using DataLab