quick_chisq: Quick Chi-Square Test with Automatic Visualization

Description

Perform chi-square test of independence or Fisher's exact test (automatically selected based on expected frequencies) with publication-ready visualization. Designed for analyzing the association between two categorical variables.

Usage

quick_chisq(
  data,
  var1,
  var2,
  method = c("auto", "chisq", "fisher", "mcnemar"),
  correct = NULL,
  conf.level = 0.95,
  plot_type = c("bar_grouped", "bar_stacked", "heatmap"),
  show_p_value = TRUE,
  p_label = c("p.format", "p.signif"),
  palette = "qual_vivid",
  verbose = TRUE,
  ...
)

Value

An object of class quick_chisq_result containing:

plot: A ggplot object with the association visualization
test_result: The htest object from chisq.test() or fisher.test()
method_used: Character string of the test method used
contingency_table: The contingency table (counts)
expected_freq: Matrix of expected frequencies
pearson_residuals: Pearson residuals for each cell
effect_size: Cramer's V effect size measure
descriptive_stats: Data frame with frequencies and proportions
auto_decision: Details about automatic method selection
timestamp: POSIXct timestamp of analysis

Arguments

data: A data frame containing the variables.
var1: Column name for the first categorical variable (row variable). Supports both quoted and unquoted names via NSE.
var2: Column name for the second categorical variable (column variable). Supports both quoted and unquoted names via NSE.
method: Character. Test method: "auto" (default), "chisq", "fisher", or "mcnemar". When "auto", the function intelligently selects based on expected frequencies and table size. WARNING: "mcnemar" is ONLY for paired/matched data (e.g., before-after measurements on the same subjects). It tests marginal homogeneity, NOT independence. Do NOT use McNemar's test for independent samples - use "chisq" or "fisher" instead.
correct: Logical or NULL. Apply Yates' continuity correction? If NULL (default), automatically applied for 2x2 tables with expected frequencies < 10.
conf.level: Numeric. Confidence level for the interval. Default is 0.95.
plot_type: Character. Type of plot: "bar_grouped" (default), "bar_stacked", or "heatmap".
show_p_value: Logical. Display p-value on the plot? Default is TRUE.
p_label: Character. P-value label format: "p.format" (numeric p-value, default) or "p.signif" (stars).
palette: Character. Color palette name from evanverse palettes. Default is "qual_vivid". Set to NULL to use ggplot2 defaults.
verbose: Logical. Print diagnostic messages? Default is TRUE.
...: Additional arguments (currently unused, reserved for future extensions).

Important Notes

Categorical variables: Both variables must be categorical or will be coerced to factors.
Sample size: Fisher's exact test may be computationally intensive for large tables.
Missing values: Automatically removed with a warning.
Low frequencies: Cells with expected frequency < 5 may lead to unreliable results.

Details

"Quick" means easy to use, not simplified or inaccurate.

This function performs full statistical testing with proper assumption checking:

Automatic Method Selection (method = "auto")

The function uses an intelligent algorithm based on expected frequencies:

All expected frequencies >= 5: Standard chi-square test
2x2 table with any expected frequency < 5: Fisher's exact test
Larger table with expected frequency < 5: Chi-square with warning
2x2 table with 5 <= expected frequency < 10: Chi-square with Yates' correction

Effect Size

Cramer's V is calculated as a measure of effect size:

Small effect: V = 0.1
Medium effect: V = 0.3
Large effect: V = 0.5

Pearson Residuals

Pearson residuals are calculated for each cell as (observed - expected) / sqrt(expected):

Values > |2| indicate significant deviation from independence
Values > |3| indicate very significant deviation

Visualization Options

bar_grouped: Grouped bar chart (default)
bar_stacked: Stacked bar chart (100\
heatmap: Heatmap of Pearson residuals

Examples

Run this code

# Example 1: Basic usage with automatic method selection
set.seed(123)
data <- data.frame(
  treatment = sample(c("A", "B", "C"), 100, replace = TRUE),
  response = sample(c("Success", "Failure"), 100, replace = TRUE,
                    prob = c(0.6, 0.4))
)

result <- quick_chisq(data, var1 = treatment, var2 = response)
print(result)

# Example 2: 2x2 table
data_2x2 <- data.frame(
  gender = rep(c("Male", "Female"), each = 50),
  disease = sample(c("Yes", "No"), 100, replace = TRUE)
)

result <- quick_chisq(data_2x2, var1 = gender, var2 = disease)

# Example 3: Customize visualization
result <- quick_chisq(data,
                      var1 = treatment,
                      var2 = response,
                      plot_type = "bar_grouped",
                      palette = "qual_balanced")

# Example 4: Manual method selection
result <- quick_chisq(data,
                      var1 = treatment,
                      var2 = response,
                      method = "chisq",
                      correct = FALSE)

# Access components
result$plot                      # ggplot object
result$test_result               # htest object
result$contingency_table         # Contingency table
result$pearson_residuals         # Pearson residuals
summary(result)                  # Detailed summary

Run the code above in your browser using DataLab