epi.2by2: Summary measures for count data presented in a 2 by 2 table

Description

Computes summary measures of risk and a chi-squared test for difference in the observed proportions from count data presented in a 2 by 2 table. Multiple strata may be represented by additional rows of count data and in this case crude and Mantel-Haenszel adjusted measures of association are calculated and chi-squared tests of homogeneity are returned.

Usage

epi.2by2(dat, method = "cohort.count", conf.level = 0.95, units = 100, 
   homogeneity = "breslow.day", outcome = "as.columns")
## S3 method for class 'epi.2by2':
print(x, ...)
## S3 method for class 'epi.2by2':
summary(object, ...)

Arguments

dat

an object of class table containing the individual cell frequencies.

method

a character string indicating the experimental design on which the tabular data has been based. Options are cohort.count, cohort.time, case.control, or cross.sectional.

conf.level

magnitude of the returned confidence intervals. Must be a single number between 0 and 1.

units

multiplier for prevalence and incidence estimates.

homogeneity

a character string indicating the type of homogeneity test to perform. Options are breslow.day or woolf.

outcome

a character string indicating how the outcome variable is represented in the contingency table. Options are as.columns (outcome as columns) or as.rows (outcome as rows).

x, object

an object of class epi.2by2.

...

Ignored.

Value

An object of class epi.2by2 comprised of:
methodcharacter string specifying the experimental design on which the tabular data has been based.
n.stratanumber of strata.
conf.levelmagnitude of the returned confidence intervals.
massoca list comprised of the computed measures of association. See below for details.
taba data frame comprised of of the contingency table data.
When method equals cohort.count the following measures of association and effect are returned:
RR.strata.wald, RR.srata.score: incidence risk ratios for each strata (Wald and score confidence intervals, respectively). RR.crude.wald, RR.crude.score, RR.mh: incidence risk ratio (Wald and score confidence intervals, respectively) across all strata and Mantel-Haenszel adjusted incidence risk ratio. OR.strata.wald, OR.strata.cfield, OR.strata.score, OR.strata.mle: odds ratios for each strata (Wald, Cornfield and score confidence intervals, respectively). OR.crude.wald, OR.crude.cfield, OR.crude.score, OR.crude.mle, OR.mh: odds ratio (Wald, Cornfield, score and maximum likelihood and score confidence intervals, respectively) across all strata and Mantel-Haenszel adjusted odds ratio. ARe.strata.wald, ARe.strata.score: attributable risks in the exposed for each strata (Wald and score confidence intervals, respectively). ARe.crude.wald, ARe.crude.score, AR.mh: attributable risk (Wald and score confidence intervals, respectively) across all strata and Mantel-Haenszel adjusted attributable risk. ARp.strata.wald, ARp.strata.piri: population attributable risks for each strata (Wald and Pirikahu confidence intervals, respectively). AFe.strata: attributable fractions in the exposed for each strata. AFp.strata: attributable fractions in the population for each strata. chisq.strata: chi-squared test for difference in exposed and non-exposed proportions for each strata. chisq.crude: chi-squared test for difference in exposed and non-exposed proportions across all strata. chisq.mh: Mantel-Haenszel chi-squared test. RR.homog, OR.homog: tests of homogeneity of the individual strata incidence risk ratios and odds ratios.
When method equals cohort.time the following measures of association and effect are returned:
IRR.strata: incidence rate ratios for each strata. IRR.crude, IRR.mh: incidence rate ratio across all strata and Mantel-Haenszel adjusted incidence rate ratio. AR.strata: attributable rates in the exposed for each strata. AR.crude, AR.mh: attributable rate in the exposed across all strata and Mantel-Haenszel adjusted attributable rate in the exposed. ARp.strata: population attributable rates for each strata. AFp.strata: attributable fractions in the population for each strata. chisq.strata: chi-squared test for difference in exposed and non-exposed proportions for each strata. chisq.crude: chi-squared test for difference in exposed and non-exposed proportions across all strata. chisq.mh: Mantel-Haenszel chi-squared test.
When method equals case.control the following measures of association and effect are returned:
OR.strata.wald, OR.strata.cfield, OR.strata.score, OR.strata.mle: odds ratios for each strata (Wald, Cornfield, score and maximum likelihood confidence intervals, respectively). OR.crude.wald, OR.crude.cfield, OR.crude.score, OR.crude.mle, OR.mh: odds ratio (computed using Wald, Cornfield, score and maximum likelihood confidence intervals, respectively) across all strata and Mantel-Haenszel adjusted odds ratio. ARe.strata.wald, ARe.strata.score: attributable risks in the exposed for each strata (Wald and score confidence intervals, respectively). ARe.crude.wald, ARe.crude.score, AR.mh: attributable prevalence in the exposed across all strata (Wald and score confidence intervals, respectively) and Mantel-Haenszel attributable prevalence in the exposed. ARp.strata.wald, ARp.strata.piri: attributable prevalence in the population for each strata (Wald and Pirikahu confidence intervals, respectively). ARp.crude.wald, ARp.crude.piri: attributable prevalence in the population (Wald and Pirikahu confidence intervals, respectively). AFest.strata: estimated attributable fractions in the exposed for each strata. AFpest.strata: estimated attributable fractions in the population for each strata. chisq.strata: chi-squared test for difference in exposed and non-exposed proportions for each strata. chisq.crude: chi-squared test for difference in exposed and non-exposed proportions across all strata. chisq.mh: Mantel-Haenszel chi-squared test. OR.homog: tests of homogeneity of the individual strata odds ratios.
When method equals cross.sectional the following measures of association and effect are returned:
PR.strata.wald, PR.srata.score: prevalence ratios for each strata (Wald and score confidence intervals, respectively). PR.crude.wald, PR.crude.score, PR.mh: prevalence ratio (Wald and score confidence intervals, respectively) across all strata and Mantel-Haenszel adjusted prevalence ratio. OR.strata.wald, OR.strata.cfield, OR.strata.score, OR.strata.mle: odds ratios for each strata (Wald, Cornfield, score and maximum likelihood confidence intervals, respectively). OR.crude.wald, OR.crude.cfield, OR.crude.score, OR.crude.mle, OR.mh: odds ratio (computed using Wald, Cornfield, score and maximum likelihood confidence intervals, respectively) across all strata and Mantel-Haenszel adjusted odds ratio. ARe.strata.wald, ARe.strata.score: attributable risks in the exposed for each strata (Wald and score confidence intervals, respectively). ARe.crude.wald, ARe.crude.score, AR.mh: attributable prevalence in the exposed across all strata (Wald and score confidence intervals, respectively) and Mantel-Haenszel attributable prevalence in the exposed. ARp.strata.wald, ARp.strata.piri: attributable prevalence in the population for each strata (Wald and Pirikahu confidence intervals, respectively. AFe.strata: attributable fractions in the exposed for each strata. AFp.strata: attributable fractions in the population for each strata. chisq.strata: chi-squared test for difference in exposed and non-exposed proportions for each strata. chisq.crude: chi-squared test for difference in exposed and non-exposed proportions across all strata. chisq.mh: Mantel-Haenszel chi-squared test. PR.homog, OR.homog: tests of homogeneity of the individual strata prevalence and odds ratios.

Details

Where method is cohort.count, case.control, or cross.sectional and outcome = as.columns the required 2 by 2 table format is:

lll{ Disease + Disease - Expose + a b Expose - c d } Where method is cohort.time and outcome = as.columns the required 2 by 2 table format is:

lll{ Disease + Time at risk Expose + a b Expose - c d }

A summary of the methods used for each of the confidence interval calculations in this function is as follows:

lll{ Name Type Reference wRR. Wald Wald (1943) scRR. Score Miettinen and Nurminen (1985) IRR. - Kirkwood and Steine (2003, 240 - 248) wOR. Wald Wald (1943) cfOR. Cornfield Cornfield (1956) scOR. Score Miettinen and Nurminen 1985 mOR. MLE Fleiss et al. (2003) wARisk. Wald Wald (1943) scARisk. Score Miettinen and Nurminen 1985 ARate. - Rothman (2002) p 137 AFRisk. - Hanley (2001) AFRate. - Hanley (2001) AFest. - Hanley (2001) wPARisk. Wald Wald (1943) pPARisk. Pirikahu Pirikahu (2014) PARate. - Rothman (2002) p 137 PAFRisk. - Jewell (2004) p 84 PAFRate. - Sullivan (2009) PAFest. - Jewell (2004) p 84 }

References

Altman D, Machin D, Bryant T, Gardner M (2000). Statistics with Confidence. British Medical Journal, London, pp. 69.

Cornfield, J (1956). A statistical problem arising from retrospective studies. In: Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, Berkeley California 4: 135 - 148.

Elwood JM (2007). Critical Appraisal of Epidemiological Studies and Clinical Trials. Oxford University Press, London.

Feinstein AR (2002). Principles of Medical Statistics. Chapman Hall/CRC, London, 332 - 336.

Fisher RA (1962). Confidence limits for a cross-product ratio. Australian Journal of Statistics 4: 41.

Feychting M, Osterlund B, Ahlbom A (1998). Reduced cancer incidence among the blind. Epidemiology 9: 490 - 494.

Fleiss JL, Levin B, Paik MC (2003). Statistical Methods for Rates and Proportions. John Wiley and Sons, New York.

Hanley JA (2001). A heuristic approach to the formulas for population attributable fraction. Journal of Epidemiology and Community Health 55: 508 - 514.

Lancaster H (1961) Significance tests in discrete distributions. Journal of the American Statistical Association 56: 223 - 234.

Jewell NP (2004). Statistics for Epidemiology. Chapman & Hall/CRC, London, pp. 84 - 85.

Juul S (2004). Epidemiologi og evidens. Munksgaard, Copenhagen.

Kirkwood BR, Sterne JAC (2003). Essential Medical Statistics. Blackwell Science, Malden, MA, USA.

Lawson R (2004). Small sample confidence intervals for the odds ratio. Communications in Statistics Simulation and Computation 33: 1095 - 1113.

Martin SW, Meek AH, Willeberg P (1987). Veterinary Epidemiology Principles and Methods. Iowa State University Press, Ames, Iowa, pp. 130.

McNutt L, Wu C, Xue X, Hafner JP (2003). Estimating the relative risk in cohort studies and clinical trials of common outcomes. American Journal of Epidemiology 157: 940 - 943.

Miettinen OS, Nurminen M (1985). Comparative analysis of two rates. Statistics in Medicine 4: 213 - 226.

Pirikahu S (2014). Confidence Intervals for Population Attributable Risk. Unpublished MSc thesis. Massey University, Palmerston North, New Zealand.

Robbins AS, Chao SY, Fonesca VP (2002). What's the relative risk? A method to directly estimate risk ratios in cohort studies of common outcomes. Annals of Epidemiology 12: 452 - 454.

Rothman KJ (2002). Epidemiology An Introduction. Oxford University Press, London, pp. 130 - 143.

Rothman KJ, Greenland S (1998). Modern Epidemiology. Lippincott Williams, & Wilkins, Philadelphia, pp. 271.

Sullivan KM, Dean A, Soe MM (2009). OpenEpi: A Web-based Epidemiologic and Statistical Calculator for Public Health. Public Health Reports 124: 471 - 474.

Wald A (1943). Tests of statistical hypotheses concerning several parameters when the number of observations is large. Transactions of the American Mathematical Society 54: 426 - 482.

Willeberg P (1977). Animal disease information processing: Epidemiologic analyses of the feline urologic syndrome. Acta Veterinaria Scandinavica. Suppl. 64: 1 - 48.

Woodward MS (2005). Epidemiology Study Design and Data Analysis. Chapman & Hall/CRC, New York, pp. 163 - 214.

Zhang J, Yu KF (1998). What's the relative risk? A method for correcting the odds ratio in cohort studies of common outcomes. Journal of the American Medical Association 280: 1690 - 1691.

Examples

Run this code

## EXAMPLE 1:
## A cross sectional study investigating the relationship between dry cat 
## food (DCF) and feline urologic syndrome (FUS) was conducted (Willeberg 
## 1977). Counts of individuals in each group were as follows:

## DCF-exposed cats (cases, non-cases) 13, 2163
## Non DCF-exposed cats (cases, non-cases) 5, 3349

## Outcome variable (FUS) as columns:
dat <- matrix(c(13,2163,5,3349), nrow = 2, byrow = TRUE)
rownames(dat) <- c("DF+", "DF-"); colnames(dat) <- c("FUS+", "FUS-"); dat

epi.2by2(dat = as.table(dat), method = "cross.sectional", 
   conf.level = 0.95, units = 100,  homogeneity = "breslow.day", 
   outcome = "as.columns")

## Outcome variable (FUS) as rows:
dat <- matrix(c(13,5,2163,3349), nrow = 2, byrow = TRUE)
rownames(dat) <- c("FUS+", "FUS-"); colnames(dat) <- c("DF+", "DF-"); dat

epi.2by2(dat =  as.table(dat), method = "cross.sectional", 
   conf.level = 0.95, units = 100,  homogeneity = "breslow.day", 
   outcome = "as.rows")

## Prevalence ratio:
## The prevalence of FUS in DCF exposed cats is 4.01 (95\% CI 1.43 to 11.23) 
## times greater than the prevalence of FUS in non-DCF exposed cats.

## Attributable fraction:
## In DCF exposed cats, 75\% of FUS is attributable to DCF (95\% CI 30\% to 
## 91\%).

## Population attributable fraction:
## Fifty-four percent of FUS cases in the cat population are attributable 
## to DCF (95\% CI 4\% to 78\%).

## EXAMPLE 2:
## This example shows how the table function can be used to pass data to
## epi.2by2. Here we use the birthwgt data from the MASS package.

library(MASS)
dat1 <- birthwt; head(dat1)

## Generate a table of cell frequencies. First set the levels of the outcome 
## and the exposure so the frequencies in the 2 by 2 table come out in the 
## conventional format:
dat1$low <- factor(dat1$low, levels = c(1,0))
dat1$smoke <- factor(dat1$smoke, levels = c(1,0))
dat1$race <- factor(dat1$race, levels = c(1,2,3))

## Generate the 2 by 2 table. Exposure (rows) = smoke. Outcome (columns) = low.
tab1 <- table(dat1$smoke, dat1$low, dnn = c("Smoke", "Low BW"))
print(tab1)

## Compute the incidence risk ratio and other measures of association:
epi.2by2(dat = tab1, method = "cohort.count", 
   conf.level = 0.95, units = 100,  homogeneity = "breslow.day",
   outcome = "as.columns")

## Odds ratio:
## The odds of having a low birth weight child for smokers is 2.02 
## (95\% CI 1.08 to 3.78) times  greater than the odds of having 
## a low birth weight child for non-smokers.

## Now stratify by race:
tab2 <- table(dat1$smoke, dat1$low, dat1$race, 
   dnn = c("Smoke", "Low BW", "Race"))
print(tab2)

## Compute the crude odds ratio, the Mantel-Haenszel adjusted odds ratio 
## and other measures of association:
epi.2by2(dat = tab2, method = "cohort.count", 
   conf.level = 0.95, units = 100,  homogeneity = "breslow.day", 
   outcome = "as.columns")

## After adjusting for the confounding effect of race, the odds of 
## having a low birth weight child for smokers is 2.15 (95\% CI 1.29 to 3.58) 
## times that of non-smokers.

## Now turn tab2 into a data frame where the frequencies of individuals in 
## each exposure-outcome category are provided. Often your data will be 
## presented in this summary format:
dat2 <- data.frame(tab2)
print(dat2)

## Re-format dat2 (a summary count data frame) into tabular format using the 
## xtabs function:
tab3 <- xtabs(Freq ~ Smoke + Low.BW + Race, data = dat2)
print(tab3)

# tab3 can now be passed to epi.2by2:
rval <- epi.2by2(dat = tab3, method = "cohort.count", 
   conf.level = 0.95, units = 100,  homogeneity = "breslow.day", 
   outcome = "as.columns")
print(rval)

## The Mantel-Haenszel adjusted odds ratio is 3.09 (95\% CI 1.49 to 6.39). The 
## ratio of the crude odds ratio to the Mantel-Haensel adjusted odds ratio is
## 0.66.

## What are the Cornfield confidence limits, the maximum likelihood 
## confidence limits and the score confidence limits for the crude odds ratio?
rval$massoc$OR.crude.cfield
rval$massoc$OR.crude.mle
rval$massoc$OR.crude.score

## Cornfield: 2.20 (95\% CI 1.07 to 3.79)
## Maximum likelihood: 2.01 (1.03 to 3.96)
# Score: 2.20 (95\% CI 2.84 to 5.17)

## Plot the individual strata-level odds ratios and compare them with the 
## Mantel-Haenszel adjusted odds ratio.

library(ggplot2); library(scales)

nstrata <- 1:dim(tab3)[3]
strata.lab <- paste("Strata ", nstrata, sep = "")
y.at <- c(nstrata, max(nstrata) + 1)
y.lab <- c("M-H", strata.lab)
x.at <- c(0.25, 0.5, 1, 2, 4, 8, 16, 32)

or.l <- c(rval$massoc$OR.mh$lower, rval$massoc$OR.strata.cfield$lower)
or.u <- c(rval$massoc$OR.mh$upper, rval$massoc$OR.strata.cfield$upper)
or.p <- c(rval$massoc$OR.mh$est, rval$massoc$OR.strata.cfield$est)
dat <- data.frame(y.at, y.lab, or.p, or.l, or.u)

p <- ggplot(dat, aes(or.p, y.at))
p + geom_point() + 
   geom_errorbarh(aes(xmax = or.l, xmin = or.u, height = 0.2)) + 
   labs(x = "Odds ratio", y = "Strata") + 
   scale_x_continuous(trans = log2_trans(), breaks = x.at, 
   limits = c(0.25,32)) + scale_y_continuous(breaks = y.at, labels = y.lab) + 
   geom_vline(xintercept = 1, lwd = 1) + coord_fixed(ratio = 0.75 / 1) + 
   theme(axis.title.y = element_text(vjust = 0))

## EXAMPLE 3:
## A study was conducted by Feychting et al (1998) comparing cancer occurrence
## among the blind with occurrence among those who were not blind but had 
## severe visual impairment. From these data we calculate a cancer rate of
## 136/22050 person-years among the blind compared with 1709/127650 person-
## years among those who were visually impaired but not blind.

dat <- as.table(matrix(c(136,22050,1709,127650), nrow = 2, byrow = TRUE))
rval <- epi.2by2(dat = dat, method = "cohort.time", conf.level = 0.90, 
   units = 1000,  homogeneity = "breslow.day", outcome = "as.columns")
summary(rval)$ARe.strata

## The incidence rate of cancer was 7.22 cases per 1000 person-years less in the 
## blind, compared with those who were not blind but had severe visual impairment
## (90\% CI 6.20 to 8.24 cases per 1000 person-years). Confidence intervals
## for this attributable risk estimate are from Rothman (2002, p 137).

summary(rval)$IRR
round(summary(rval)$IRR.strata, digits = 2)

## The incidence rate of cancer in the blind group was less than half that of the 
## comparison group (incidence rate ratio 0.46, 90\% CI 0.40 to 0.53).

Run the code above in your browser using DataLab