Learn R Programming

spicy (version 0.11.0)

cross_tab: Cross-tabulation

Description

Computes a two-way cross-tabulation with optional weights, grouping (including combinations of multiple variables), percentage displays, and inferential statistics.

cross_tab() produces weighted or unweighted contingency tables with row or column percentages, optional grouping via by, and associated Chi-squared tests with an association measure and diagnostic information.

Both x and y variables are required. For one-way frequency tables, use freq() instead.

Usage

cross_tab(
  data,
  x,
  y = NULL,
  by = NULL,
  weights = NULL,
  rescale = FALSE,
  percent = c("none", "column", "row"),
  include_stats = TRUE,
  assoc_measure = c("auto", "cramer_v", "phi", "gamma", "tau_b", "tau_c", "somers_d",
    "lambda", "none"),
  assoc_ci = FALSE,
  correct = FALSE,
  simulate_p = FALSE,
  simulate_B = 2000,
  digits = NULL,
  styled = TRUE,
  show_n = TRUE,
  decimal_mark = ".",
  p_digits = 3L
)

Value

A data.frame, list of data.frames, or spicy_cross_table object. When by is used, returns a spicy_cross_table_list.

Arguments

data

A data frame. Alternatively, a vector when using the vector-based interface.

x

Row variable (unquoted).

y

Column variable (unquoted). Mandatory; for one-way tables, use freq().

by

Optional grouping variable or expression. Can be a single variable or a combination of multiple variables (e.g. interaction(vs, am)).

weights

Optional numeric weights.

rescale

Logical. If FALSE (the default), weights are used as-is. If TRUE, rescales weights so total weighted N matches raw N.

percent

One of "none" (the default), "row", "column". Unique abbreviations are accepted (e.g. "n", "r", "c").

include_stats

Logical. If TRUE (the default), computes Chi-squared and an association measure (see assoc_measure).

assoc_measure

Character. Which association measure to report. "auto" (default) selects Kendall's Tau-b when both variables are ordered factors and Cramer's V otherwise. Other choices: "cramer_v", "phi", "gamma", "tau_b", "tau_c", "somers_d", "lambda", "none".

assoc_ci

Logical. If TRUE, includes the 95 percent confidence interval of the association measure in the note. Defaults to FALSE.

correct

Logical. If FALSE (the default), no continuity correction is applied. If TRUE, applies Yates correction (only for 2x2 tables).

simulate_p

Logical. If FALSE (the default), uses asymptotic p-values. If TRUE, uses Monte Carlo simulation.

simulate_B

Integer. Number of replicates for Monte Carlo simulation. Defaults to 2000.

digits

Number of decimals for cell values. Defaults to 1 for percentages, 0 for counts.

styled

Logical. If TRUE (the default), returns a spicy_cross_table object (for formatted printing). If FALSE, returns a plain data.frame.

show_n

Logical. If TRUE (the default), adds marginal N totals when percent != "none".

decimal_mark

Character used as the decimal mark in printed numeric values (cells, chi-squared, association estimate, CI bounds, p-value). Defaults to ".". Set to "," for European formatting; matches the decimal_mark argument of the table_*() family.

p_digits

Integer number of decimals used to format the p-value (and to determine the small-p threshold below which < .001 notation is used). Defaults to 3 (the APA standard); matches the p_digits argument of the table_*() family.

Global Options

The function recognizes the following global options that modify its default behavior:

  • options(spicy.percent = "column") Sets the default percentage mode for all calls to cross_tab(). Valid values are "none", "row", and "column". Equivalent to setting percent = "column" (or another choice) in each call.

  • options(spicy.simulate_p = TRUE) Enables Monte Carlo simulation for all Chi-squared tests by default. Equivalent to setting simulate_p = TRUE in every call.

  • options(spicy.rescale = TRUE) Automatically rescales weights so that total weighted N equals the raw N. Equivalent to setting rescale = TRUE in each call.

These options are convenient for users who wish to enforce consistent behavior across multiple calls to cross_tab() and other spicy table functions. They can be disabled or reset by setting them to NULL: options(spicy.percent = NULL, spicy.simulate_p = NULL, spicy.rescale = NULL).

Example:

options(spicy.simulate_p = TRUE, spicy.rescale = TRUE)
cross_tab(sochealth, smoking, education, weights = weight)

Examples

Run this code
# Basic crosstab
cross_tab(sochealth, smoking, education)

# Column percentages
cross_tab(sochealth, smoking, education, percent = "column")

# Weighted (rescaled)
cross_tab(sochealth, smoking, education, weights = weight, rescale = TRUE)

# Grouped by sex
cross_tab(sochealth, smoking, education, by = sex)

# Grouped by combination of variables
cross_tab(sochealth, smoking, education, by = interaction(sex, age_group))

# Ordinal variables: auto-selects Kendall's Tau-b
cross_tab(sochealth, education, self_rated_health)

# 2x2 table with Yates correction
cross_tab(sochealth, smoking, physical_activity, correct = TRUE)

# APA-style p-value precision and European decimal mark
cross_tab(sochealth, smoking, education, decimal_mark = ",", p_digits = 4)

Run the code above in your browser using DataLab