contingency_table: Contingency table analyses

Description

The table below provides summary about:

statistical test carried out for inferential statistics
type of effect size estimate and a measure of uncertainty for this estimate
functions used internally to compute these details

two-way table

Hypothesis testing

Type	Design	Test	Function used
Parametric/Non-parametric	Unpaired	Pearson's chi-squared test	`stats::chisq.test()`
Bayesian	Unpaired	Bayesian Pearson's chi-squared test	`BayesFactor::contingencyTableBF()`
Parametric/Non-parametric	Paired	McNemar's chi-squared test	`stats::mcnemar.test()`
Bayesian	Paired	No	No

Effect size estimation

Type	Design	Effect size	CI available?	Function used
Parametric/Non-parametric	Unpaired	Cramer's V	Yes	`effectsize::cramers_v()`
Bayesian	Unpaired	Cramer's V	Yes	`effectsize::cramers_v()`
Parametric/Non-parametric	Paired	Cohen's g	Yes	`effectsize::cohens_g()`
Bayesian	Paired	No	No	No

one-way table

Hypothesis testing

Type	Test	Function used
Parametric/Non-parametric	Goodness of fit chi-squared test	`stats::chisq.test()`
Bayesian	Bayesian Goodness of fit chi-squared test	(custom)

Effect size estimation

Type	Effect size	CI available?	Function used
Parametric/Non-parametric	Pearson's C	Yes	`effectsize::pearsons_c()`
Bayesian	No	No	No

Usage

contingency_table(
  data,
  x,
  y = NULL,
  paired = FALSE,
  type = "parametric",
  counts = NULL,
  ratio = NULL,
  k = 2L,
  conf.level = 0.95,
  sampling.plan = "indepMulti",
  fixed.margin = "rows",
  prior.concentration = 1,
  top.text = NULL,
  ...
)

Arguments

data

A dataframe (or a tibble) from which variables specified are to be taken. Other data types (e.g., matrix,table, array, etc.) will not be accepted.

The variable to use as the rows in the contingency table.

The variable to use as the columns in the contingency table. Default is NULL. If NULL, one-sample proportion test (a goodness of fit test) will be run for the x variable. Otherwise association test will be carried out.

paired

Logical indicating whether data came from a within-subjects or repeated measures design study (Default: FALSE). If TRUE, McNemar's test expression will be returned. If FALSE, Pearson's chi-square test will be returned.

type

A character specifying the type of statistical approach:

"parametric"
"nonparametric"
"robust"
"bayes"

You can specify just the initial letter.

counts

A string naming a variable in data containing counts, or NULL if each row represents a single observation.

ratio

A vector of proportions: the expected proportions for the proportion test (should sum to 1). Default is NULL, which means the null is equal theoretical proportions across the levels of the nominal variable. This means if there are two levels this will be ratio = c(0.5,0.5) or if there are four levels this will be ratio = c(0.25,0.25,0.25,0.25), etc.

Number of digits after decimal point (should be an integer) (Default: k = 2L).

conf.level

Scalar between 0 and 1. If unspecified, the defaults return 95% confidence/credible intervals (0.95).

sampling.plan

Character describing the sampling plan. Possible options are "indepMulti" (independent multinomial; default), "poisson", "jointMulti" (joint multinomial), "hypergeom" (hypergeometric). For more, see ?BayesFactor::contingencyTableBF().

fixed.margin

For the independent multinomial sampling plan, which margin is fixed ("rows" or "cols"). Defaults to "rows".

prior.concentration

Specifies the prior concentration parameter, set to 1 by default. It indexes the expected deviation from the null hypothesis under the alternative, and corresponds to Gunel and Dickey's (1974) "a" parameter.

top.text

Text to display on top of the Bayes Factor message. This is mostly relevant in the context of ggstatsplot package functions.

...

Additional arguments (currently ignored).

Value

The returned tibble dataframe can contain some or all of the following columns (the exact columns will depend on the statistical test):

statistic: the numeric value of a statistic
df: the numeric value of a parameter being modeled (often degrees of freedom for the test)
df.error and df: relevant only if the statistic in question has two degrees of freedom (e.g. anova)
p.value: the two-sided p-value associated with the observed statistic
method: the name of the inferential statistical test
estimate: estimated value of the effect size
conf.low: lower bound for the effect size estimate
conf.high: upper bound for the effect size estimate
conf.level: width of the confidence interval
conf.method: method used to compute confidence interval
conf.distribution: statistical distribution for the effect
effectsize: the name of the effect size
n.obs: number of observations
expression: pre-formatted expression containing statistical details

For examples of dataframe outputs, see examples and this vignette.

Note that all examples are preceded by set.seed() calls for reproducibility.

Examples

Run this code

# NOT RUN {
# for reproducibility
set.seed(123)
library(statsExpressions)
options(tibble.width = Inf, pillar.bold = TRUE, pillar.neg = TRUE)

# ------------------------ non-Bayesian -----------------------------

# association test
contingency_table(
  data   = mtcars,
  x      = am,
  y      = cyl,
  paired = FALSE
)

# goodness-of-fit test
contingency_table(
  data   = as.data.frame(HairEyeColor),
  x      = Eye,
  counts = Freq,
  ratio  = c(0.2, 0.2, 0.3, 0.3)
)

# ------------------------ Bayesian -----------------------------

# association test
contingency_table(
  data   = mtcars,
  x      = am,
  y      = cyl,
  paired = FALSE,
  type   = "bayes"
)

# goodness-of-fit test
contingency_table(
  data   = as.data.frame(HairEyeColor),
  x      = Eye,
  counts = Freq,
  ratio  = c(0.2, 0.2, 0.3, 0.3),
  type   = "bayes"
)
# }

Run the code above in your browser using DataLab