Learn R Programming

evanverse (version 0.4.0)

quick_chisq: Quick Chi-Square Test with Automatic Visualization

Description

Perform chi-square test of independence or Fisher's exact test (automatically selected based on expected frequencies) with publication-ready visualization. Designed for analyzing the association between two categorical variables.

Usage

quick_chisq(
  data,
  var1,
  var2,
  method = c("auto", "chisq", "fisher", "mcnemar"),
  correct = NULL,
  conf.level = 0.95,
  plot_type = c("bar_grouped", "bar_stacked", "heatmap"),
  show_p_value = TRUE,
  p_label = c("p.format", "p.signif"),
  palette = "qual_vivid",
  verbose = TRUE,
  ...
)

Value

An object of class quick_chisq_result containing:

plot

A ggplot object with the association visualization

test_result

The htest object from chisq.test() or fisher.test()

method_used

Character string of the test method used

contingency_table

The contingency table (counts)

expected_freq

Matrix of expected frequencies

pearson_residuals

Pearson residuals for each cell

effect_size

Cramer's V effect size measure

descriptive_stats

Data frame with frequencies and proportions

auto_decision

Details about automatic method selection

timestamp

POSIXct timestamp of analysis

Arguments

data

A data frame containing the variables.

var1

Column name for the first categorical variable (row variable). Supports both quoted and unquoted names via NSE.

var2

Column name for the second categorical variable (column variable). Supports both quoted and unquoted names via NSE.

method

Character. Test method: "auto" (default), "chisq", "fisher", or "mcnemar". When "auto", the function intelligently selects based on expected frequencies and table size. WARNING: "mcnemar" is ONLY for paired/matched data (e.g., before-after measurements on the same subjects). It tests marginal homogeneity, NOT independence. Do NOT use McNemar's test for independent samples - use "chisq" or "fisher" instead.

correct

Logical or NULL. Apply Yates' continuity correction? If NULL (default), automatically applied for 2x2 tables with expected frequencies < 10.

conf.level

Numeric. Confidence level for the interval. Default is 0.95.

plot_type

Character. Type of plot: "bar_grouped" (default), "bar_stacked", or "heatmap".

show_p_value

Logical. Display p-value on the plot? Default is TRUE.

p_label

Character. P-value label format: "p.format" (numeric p-value, default) or "p.signif" (stars).

palette

Character. Color palette name from evanverse palettes. Default is "qual_vivid". Set to NULL to use ggplot2 defaults.

verbose

Logical. Print diagnostic messages? Default is TRUE.

...

Additional arguments (currently unused, reserved for future extensions).

Important Notes

  • Categorical variables: Both variables must be categorical or will be coerced to factors.

  • Sample size: Fisher's exact test may be computationally intensive for large tables.

  • Missing values: Automatically removed with a warning.

  • Low frequencies: Cells with expected frequency < 5 may lead to unreliable results.

Details

"Quick" means easy to use, not simplified or inaccurate.

This function performs full statistical testing with proper assumption checking:

Automatic Method Selection (method = "auto")

The function uses an intelligent algorithm based on expected frequencies:

  • All expected frequencies >= 5: Standard chi-square test

  • 2x2 table with any expected frequency < 5: Fisher's exact test

  • Larger table with expected frequency < 5: Chi-square with warning

  • 2x2 table with 5 <= expected frequency < 10: Chi-square with Yates' correction

Effect Size

Cramer's V is calculated as a measure of effect size:

  • Small effect: V = 0.1

  • Medium effect: V = 0.3

  • Large effect: V = 0.5

Pearson Residuals

Pearson residuals are calculated for each cell as (observed - expected) / sqrt(expected):

  • Values > |2| indicate significant deviation from independence

  • Values > |3| indicate very significant deviation

Visualization Options

  • bar_grouped: Grouped bar chart (default)

  • bar_stacked: Stacked bar chart (100\

  • heatmap: Heatmap of Pearson residuals

See Also

chisq.test, fisher.test, quick_ttest, quick_anova

Examples

Run this code
# Example 1: Basic usage with automatic method selection
set.seed(123)
data <- data.frame(
  treatment = sample(c("A", "B", "C"), 100, replace = TRUE),
  response = sample(c("Success", "Failure"), 100, replace = TRUE,
                    prob = c(0.6, 0.4))
)

result <- quick_chisq(data, var1 = treatment, var2 = response)
print(result)

# Example 2: 2x2 table
data_2x2 <- data.frame(
  gender = rep(c("Male", "Female"), each = 50),
  disease = sample(c("Yes", "No"), 100, replace = TRUE)
)

result <- quick_chisq(data_2x2, var1 = gender, var2 = disease)

# Example 3: Customize visualization
result <- quick_chisq(data,
                      var1 = treatment,
                      var2 = response,
                      plot_type = "bar_grouped",
                      palette = "qual_balanced")

# Example 4: Manual method selection
result <- quick_chisq(data,
                      var1 = treatment,
                      var2 = response,
                      method = "chisq",
                      correct = FALSE)

# Access components
result$plot                      # ggplot object
result$test_result               # htest object
result$contingency_table         # Contingency table
result$pearson_residuals         # Pearson residuals
summary(result)                  # Detailed summary

Run the code above in your browser using DataLab