ContingencyTests: Tests of Independence in Two- or Three-Way Contingency Tables

Description

Testing the independence of two nominal or ordered factors.

Usage

# S3 method for formula
chisq_test(formula, data, subset = NULL, weights = NULL, …)
# S3 method for table
chisq_test(object, …)
# S3 method for IndependenceProblem
chisq_test(object, …)
# S3 method for formula
cmh_test(formula, data, subset = NULL, weights = NULL, …)
# S3 method for table
cmh_test(object, …)
# S3 method for IndependenceProblem
cmh_test(object, …)
# S3 method for formula
lbl_test(formula, data, subset = NULL, weights = NULL, …)
# S3 method for table
lbl_test(object, …)
# S3 method for IndependenceProblem
lbl_test(object, …)

Arguments

formula

a formula of the form y ~ x | block where y and x are factors and block is an optional factor for stratification.

data

an optional data frame containing the variables in the model formula.

subset

an optional vector specifying a subset of observations to be used. Defaults to NULL.

weights

an optional formula of the form ~ w defining integer valued case weights for each observation. Defaults to NULL, implying equal weight for all observations.

object

an object inheriting from classes "table" or "'>IndependenceProblem".

…

further arguments to be passed to independence_test.

Value

An object inheriting from class "'>IndependenceTest".

Details

chisq_test, cmh_test and lbl_test provide the Pearson chi-squared test, the generalized Cochran-Mantel-Haenszel test and the linear-by-linear association test. A general description of these methods is given by Agresti (2002).

The null hypothesis of independence, or conditional independence given block, between y and x is tested.

If y and/or x are ordered factors, the default scores, 1:nlevels(y) and 1:nlevels(x) respectively, can be altered using the scores argument (see independence_test); this argument can also be used to coerce nominal factors to class "ordered". (lbl_test coerces to class "ordered" under any circumstances.) If both y and x are ordered factors, a linear-by-linear association test is computed and the direction of the alternative hypothesis can be specified using the alternative argument. For the Pearson chi-squared test, this extension was given by Yates (1948) who also discussed the situation when either the response or the covariate is an ordered factor; see also Cochran (1954) and Armitage (1955) for the particular case when y is a binary factor and x is ordered. The Mantel-Haenszel statistic (Mantel and Haenszel, 1959) was similarly extended by Mantel (1963) and Landis, Heyman and Koch (1978).

The conditional null distribution of the test statistic is used to obtain $p$ -values and an asymptotic approximation of the exact distribution is used by default (distribution = "asymptotic"). Alternatively, the distribution can be approximated via Monte Carlo resampling or computed exactly for univariate two-sample problems by setting distribution to "approximate" or "exact" respectively. See asymptotic, approximate and exact for details.

References

Agresti, A. (2002). Categorical Data Analysis, Second Edition. Hoboken, New Jersey: John Wiley & Sons.

Armitage, P. (1955). Tests for linear trends in proportions and frequencies. Biometrics 11(3), 375--386. 10.2307/3001775

Cochran, W.G. (1954). Some methods for strengthening the common $χ^{2}$ tests. Biometrics 10(4), 417--451. 10.2307/3001616

Davis, L. J. (1986). Exact tests for $2 \times 2$ contingency tables. The American Statistician 40(2), 139--141. 10.1080/00031305.1986.10475377

Landis, J. R., Heyman, E. R. and Koch, G. G. (1978). Average partial association in three-way contingency tables: a review and discussion of alternative tests. International Statistical Review 46(3), 237--254. 10.2307/1402373

Mantel, N. and Haenszel, W. (1959). Statistical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer Institute 22(4), 719--748. 10.1093/jnci/22.4.719

Mantel, N. (1963). Chi-square tests with one degree of freedom: extensions of the Mantel-Haenszel procedure. Journal of the American Statistical Association 58(303), 690--700. 10.1080/01621459.1963.10500879

Yates, F. (1948). The analysis of contingency tables with groupings based on quantitative characters. Biometrika 35(1/2), 176--181. 10.1093/biomet/35.1-2.176

Examples

Run this code

# NOT RUN {
## Example data
## Davis (1986, p. 140)
davis <- matrix(
    c(3,  6,
      2, 19),
    nrow = 2, byrow = TRUE
)
davis <- as.table(davis)

## Asymptotic Pearson chi-squared test
chisq_test(davis)
chisq.test(davis, correct = FALSE) # same as above

## Approximative (Monte Carlo) Pearson chi-squared test
ct <- chisq_test(davis,
                 distribution = approximate(nresample = 10000))
pvalue(ct)             # standard p-value
midpvalue(ct)          # mid-p-value
pvalue_interval(ct)    # p-value interval
size(ct, alpha = 0.05) # test size at alpha = 0.05 using the p-value

## Exact Pearson chi-squared test (Davis, 1986)
## Note: disagrees with Fisher's exact test
ct <- chisq_test(davis,
                 distribution = "exact")
pvalue(ct)             # standard p-value
midpvalue(ct)          # mid-p-value
pvalue_interval(ct)    # p-value interval
size(ct, alpha = 0.05) # test size at alpha = 0.05 using the p-value
fisher.test(davis)


## Laryngeal cancer data
## Agresti (2002, p. 107, Tab. 3.13)
cancer <- matrix(
    c(21, 2,
      15, 3),
    nrow = 2, byrow = TRUE,
    dimnames = list(
        "Treatment" = c("Surgery", "Radiation"),
           "Cancer" = c("Controlled", "Not Controlled")
    )
)
cancer <- as.table(cancer)

## Exact Pearson chi-squared test (Agresti, 2002, p. 108, Tab. 3.14)
## Note: agrees with Fishers's exact test
(ct <- chisq_test(cancer,
                  distribution = "exact"))
midpvalue(ct)          # mid-p-value
pvalue_interval(ct)    # p-value interval
size(ct, alpha = 0.05) # test size at alpha = 0.05 using the p-value
fisher.test(cancer)


## Homework conditions and teacher's rating
## Yates (1948, Tab. 1)
yates <- matrix(
    c(141, 67, 114, 79, 39,
      131, 66, 143, 72, 35,
       36, 14,  38, 28, 16),
    byrow = TRUE, ncol = 5,
    dimnames = list(
           "Rating" = c("A", "B", "C"),
        "Condition" = c("A", "B", "C", "D", "E")
    )
)
yates <- as.table(yates)

## Asymptotic Pearson chi-squared test (Yates, 1948, p. 176)
chisq_test(yates)

## Asymptotic Pearson-Yates chi-squared test (Yates, 1948, pp. 180-181)
## Note: 'Rating' and 'Condition' as ordinal
(ct <- chisq_test(yates,
                  alternative = "less",
                  scores = list("Rating" = c(-1, 0, 1),
                                "Condition" = c(2, 1, 0, -1, -2))))
statistic(ct)^2 # chi^2 = 2.332

## Asymptotic Pearson-Yates chi-squared test (Yates, 1948, p. 181)
## Note: 'Rating' as ordinal
chisq_test(yates,
           scores = list("Rating" = c(-1, 0, 1))) # Q = 3.825


## Change in clinical condition and degree of infiltration
## Cochran (1954, Tab. 6)
cochran <- matrix(
    c(11,  7,
      27, 15,
      42, 16,
      53, 13,
      11,  1),
    byrow = TRUE, ncol = 2,
    dimnames = list(
              "Change" = c("Marked", "Moderate", "Slight",
                           "Stationary", "Worse"),
        "Infiltration" = c("0-7", "8-15")
    )
)
cochran <- as.table(cochran)

## Asymptotic Pearson chi-squared test (Cochran, 1954, p. 435)
chisq_test(cochran) # X^2 = 6.88

## Asymptotic Cochran-Armitage test (Cochran, 1954, p. 436)
## Note: 'Change' as ordinal
(ct <- chisq_test(cochran,
                  scores = list("Change" = c(3, 2, 1, 0, -1))))
statistic(ct)^2 # X^2 = 6.66


## Change in size of ulcer crater for two treatment groups
## Armitage (1955, Tab. 2)
armitage <- matrix(
    c( 6, 4, 10, 12,
      11, 8,  8,  5),
    byrow = TRUE, ncol = 4,
    dimnames = list(
        "Treatment" = c("A", "B"),
           "Crater" = c("Larger", "< 2/3 healed",
                        ">= 2/3 healed", "Healed")
    )
)
armitage <- as.table(armitage)

## Approximative (Monte Carlo) Pearson chi-squared test (Armitage, 1955, p. 379)
chisq_test(armitage,
           distribution = approximate(nresample = 10000)) # chi^2 = 5.91

## Approximative (Monte Carlo) Cochran-Armitage test (Armitage, 1955, p. 379)
(ct <- chisq_test(armitage,
                  distribution = approximate(nresample = 10000),
                  scores = list("Crater" = c(-1.5, -0.5, 0.5, 1.5))))
statistic(ct)^2 # chi_0^2 = 5.26


## Relationship between job satisfaction and income stratified by gender
## Agresti (2002, p. 288, Tab. 7.8)

## Asymptotic generalized Cochran-Mantel-Haenszel test (Agresti, p. 297)
(ct <- cmh_test(jobsatisfaction)) # CMH = 10.2001

## The standardized linear statistic
statistic(ct, type = "standardized")

## The standardized linear statistic for each block
statistic(ct, type = "standardized", partial = TRUE)

## Asymptotic generalized Cochran-Mantel-Haenszel test (Agresti, p. 297)
## Note: 'Job.Satisfaction' as ordinal
cmh_test(jobsatisfaction,
         scores = list("Job.Satisfaction" = c(1, 3, 4, 5))) # L^2 = 9.0342

## Asymptotic linear-by-linear association test (Agresti, p. 297)
## Note: 'Job.Satisfaction' and 'Income' as ordinal
(lt <- lbl_test(jobsatisfaction,
                scores = list("Job.Satisfaction" = c(1, 3, 4, 5),
                              "Income" = c(3, 10, 20, 35))))
statistic(lt)^2 # M^2 = 6.1563

## The standardized linear statistic
statistic(lt, type = "standardized")

## The standardized linear statistic for each block
statistic(lt, type = "standardized", partial = TRUE)
# }

Run the code above in your browser using DataLab