Learn R Programming

statsExpressions (version 1.3.1)

contingency_table: Contingency table analyses

Description

The table below provides summary about:

  • statistical test carried out for inferential statistics

  • type of effect size estimate and a measure of uncertainty for this estimate

  • functions used internally to compute these details

two-way table

Hypothesis testing

Type Design Test Function used
Parametric/Non-parametric Unpaired Pearson's chi-squared test stats::chisq.test()
Bayesian Unpaired Bayesian Pearson's chi-squared test BayesFactor::contingencyTableBF()
Parametric/Non-parametric Paired McNemar's chi-squared test stats::mcnemar.test()
Bayesian Paired No No

Effect size estimation

Type Design Effect size CI available? Function used
Parametric/Non-parametric Unpaired Cramer's V Yes effectsize::cramers_v()
Bayesian Unpaired Cramer's V Yes effectsize::cramers_v()
Parametric/Non-parametric Paired Cohen's g Yes effectsize::cohens_g()
Bayesian Paired No No No

one-way table

Hypothesis testing

Type Test Function used
Parametric/Non-parametric Goodness of fit chi-squared test stats::chisq.test()
Bayesian Bayesian Goodness of fit chi-squared test (custom)

Effect size estimation

Type Effect size CI available? Function used
Parametric/Non-parametric Pearson's C Yes effectsize::pearsons_c()
Bayesian No No No

Usage

contingency_table(
  data,
  x,
  y = NULL,
  paired = FALSE,
  type = "parametric",
  counts = NULL,
  ratio = NULL,
  k = 2L,
  conf.level = 0.95,
  sampling.plan = "indepMulti",
  fixed.margin = "rows",
  prior.concentration = 1,
  top.text = NULL,
  ...
)

Arguments

data

A dataframe (or a tibble) from which variables specified are to be taken. Other data types (e.g., matrix,table, array, etc.) will not be accepted.

x

The variable to use as the rows in the contingency table.

y

The variable to use as the columns in the contingency table. Default is NULL. If NULL, one-sample proportion test (a goodness of fit test) will be run for the x variable. Otherwise association test will be carried out.

paired

Logical indicating whether data came from a within-subjects or repeated measures design study (Default: FALSE). If TRUE, McNemar's test expression will be returned. If FALSE, Pearson's chi-square test will be returned.

type

A character specifying the type of statistical approach:

  • "parametric"

  • "nonparametric"

  • "robust"

  • "bayes"

You can specify just the initial letter.

counts

A string naming a variable in data containing counts, or NULL if each row represents a single observation.

ratio

A vector of proportions: the expected proportions for the proportion test (should sum to 1). Default is NULL, which means the null is equal theoretical proportions across the levels of the nominal variable. This means if there are two levels this will be ratio = c(0.5,0.5) or if there are four levels this will be ratio = c(0.25,0.25,0.25,0.25), etc.

k

Number of digits after decimal point (should be an integer) (Default: k = 2L).

conf.level

Scalar between 0 and 1. If unspecified, the defaults return 95% confidence/credible intervals (0.95).

sampling.plan

Character describing the sampling plan. Possible options are "indepMulti" (independent multinomial; default), "poisson", "jointMulti" (joint multinomial), "hypergeom" (hypergeometric). For more, see ?BayesFactor::contingencyTableBF().

fixed.margin

For the independent multinomial sampling plan, which margin is fixed ("rows" or "cols"). Defaults to "rows".

prior.concentration

Specifies the prior concentration parameter, set to 1 by default. It indexes the expected deviation from the null hypothesis under the alternative, and corresponds to Gunel and Dickey's (1974) "a" parameter.

top.text

Text to display on top of the Bayes Factor message. This is mostly relevant in the context of ggstatsplot package functions.

...

Additional arguments (currently ignored).

Value

The returned tibble dataframe can contain some or all of the following columns (the exact columns will depend on the statistical test):

  • statistic: the numeric value of a statistic

  • df: the numeric value of a parameter being modeled (often degrees of freedom for the test)

  • df.error and df: relevant only if the statistic in question has two degrees of freedom (e.g. anova)

  • p.value: the two-sided p-value associated with the observed statistic

  • method: the name of the inferential statistical test

  • estimate: estimated value of the effect size

  • conf.low: lower bound for the effect size estimate

  • conf.high: upper bound for the effect size estimate

  • conf.level: width of the confidence interval

  • conf.method: method used to compute confidence interval

  • conf.distribution: statistical distribution for the effect

  • effectsize: the name of the effect size

  • n.obs: number of observations

  • expression: pre-formatted expression containing statistical details

For examples of dataframe outputs, see examples and this vignette.

Note that all examples are preceded by set.seed() calls for reproducibility.

Examples

Run this code
# NOT RUN {
# for reproducibility
set.seed(123)
library(statsExpressions)
options(tibble.width = Inf, pillar.bold = TRUE, pillar.neg = TRUE)

# ------------------------ non-Bayesian -----------------------------

# association test
contingency_table(
  data   = mtcars,
  x      = am,
  y      = cyl,
  paired = FALSE
)

# goodness-of-fit test
contingency_table(
  data   = as.data.frame(HairEyeColor),
  x      = Eye,
  counts = Freq,
  ratio  = c(0.2, 0.2, 0.3, 0.3)
)

# ------------------------ Bayesian -----------------------------

# association test
contingency_table(
  data   = mtcars,
  x      = am,
  y      = cyl,
  paired = FALSE,
  type   = "bayes"
)

# goodness-of-fit test
contingency_table(
  data   = as.data.frame(HairEyeColor),
  x      = Eye,
  counts = Freq,
  ratio  = c(0.2, 0.2, 0.3, 0.3),
  type   = "bayes"
)
# }

Run the code above in your browser using DataLab