Frechet.bounds.cat: Frechet bounds of cells in a contingency table

Description

This function permits to derive the bounds for cell probabilities of the table Y vs. Z starting from the marginal tables (X vs. Y), (X vs. Z) and the joint distribution of the X variables.

Usage

Frechet.bounds.cat(tab.x, tab.xy, tab.xz, print.f="tables")

Arguments

tab.x

A Rtable crossing the X variables. This table must be obtained by using the function xtabs or table, e.g. tab.x <- xtabs(~x1+x2+x3, data

tab.xy

A Rtable of X vs. Y variable. This table must be obtained by using the function xtabs or table, e.g. table.xy <- xtabs(~x1+x2+x3+y, data=

tab.xz

A Rtable of X vs. Z variable. This table must be obtained by using the function xtabs or table, e.g. tab.xz <- xtabs(~x1+x2+x3+z, data=da

print.f

A string specifying the data structure of the output. When print.f="tables" (default) all the results will be saved as tables in a list. On the contrary, if print.f="data.frame", all results will be saved as columns of a data.f

Value

When print.f="tables" (default) a list with the following tables:
low.uThe estimated lower bounds for the relative frequencies in the table Y vs. Z without conditioning on the X variables.
up.uThe estimated upper bounds for the relative frequencies in the table Y vs. Z without conditioning on the X variables.
CIAThe estimated relative frequencies in the table Y vs. Z under the Conditional Independence Assumption (CIA).
low.cxThe estimated lower bounds for the relative frequencies in the table Y vs. Z when conditioning on the X variables.
up.cxThe estimated upper bounds for the relative frequencies in the table Y vs. Z when conditioning on the X variables.
When print.f="data.frame" the estimated tables are saved as columns of a data.frame.

Details

This function permits to compute the Frechet bounds for the relative frequencies in the contingency table of Y vs.Z, starting from the distributions P(Y|X), P(Z|X) and P(X). The bounds for the relative frequencies $p_{j,k}$ in the table Y vs. Z are:

$$p^{low}_{YZ}(j,k) = \sum_{i} p_X(i)\max (0; p_{Y|X}(j|i) + p_{Z|X}(k|i)-1 )$$

$$p^{up}_{YZ}(j,k) = \sum_{i} p_X(i) \min ( p_{Y|X}(j|i); p_{Z|X}(k|i))$$

The relative frequencies $p_X(i)=n_i/n$ are computed from the frequencies in tab.x; the relative frequencies $p_{Y|X}(j|i)=n_{ij}/n_{i \bullet}$ are computed from the tab.xy, finally, $p_{Z|X}(k|i)=n_{ik}/n_{k \bullet}$ are derived from tab.xy.

It is assumed that the marginal distribution of the X variables is the same in all the input tables: tab.x, tab.xy and tab.xz. If this is not true a warning message will appear.

Note that the cells bounds for the relative frequencies in the contingency table of Y vs. Z are computed also without considering the X variables:

$$\max{0; p_{Y}(j) + p_{Z}(k)-1} \leq p_{YZ}(j,k) \leq \min { p_{Y}(j); p_{Z}(k)}$$

Finally, the contingency table of Y vs. Z estimated under the Conditional Independence Assumption (CIA) is obtained by considering:

$$p_{YZ}(j,k) = p_{Y|X}(j|i) \times p_{Z|X}(k|i) \times p_{X}(i).$$

References

Ballin, M., D'Orazio, M., Di Zio, M., Scanu, M. and Torelli, N. (2009) Statistical Matching of Two Surveys with a Common Subset. Working Paper, 124. Dip. Scienze Economiche e Statistiche, Univ. di Trieste, Trieste.

D'Orazio, M., Di Zio, M. and Scanu, M. (2006). Statistical Matching: Theory and Practice. Wiley, Chichester.

Examples

Run this code

data(quine, package="MASS") #loads quine from MASS
str(quine)

# split quine in two subsets
set.seed(765)
lab.A <- sample(nrow(quine), 70, replace=TRUE)
quine.A <- quine[lab.A, 1:3]
quine.B <- quine[-lab.A, 2:4]

# compute the tables required by Frechet.bounds.cat()
freq.x <- xtabs(~Sex+Age, data=quine.A)
freq.xy <- xtabs(~Sex+Age+Eth, data=quine.A)
freq.xz <- xtabs(~Sex+Age+Lrn, data=quine.B)

# apply Frechet.bounds.cat()
bounds.yz <- Frechet.bounds.cat(tab.x=freq.x, tab.xy=freq.xy,
        tab.xz=freq.xz, print.f="data.frame")
bounds.yz

# harmonize distr. of Sex vs. Age before applying
# Frechet.bounds.cat()

quine.A$f <- 70/nrow(quine) # sampling fraction
quine.B$f <- (nrow(quine)-70)/nrow(quine)

# derive the table of Sex vs. Age related to the whole data set
tot.sex.age <- xtabs(~Sex+Age, data=quine)
tot.sex.age

# use hamonize.x() to harmonize the Sex vs. Age between
# quine.A and quine.B

# create svydesign objects
svy.qA <- svydesign(~1, fpc=~f, data=quine.A)
svy.qB <- svydesign(~1, fpc=~f, data=quine.B)

# apply harmonize.x using poststratification
out.hz <- harmonize.x(svy.A=svy.qA, svy.B=svy.qB, form.x=~Sex+Age,
          cal.method="poststratify")

# compute the new tables required by Frechet.bounds.cat()
freq.x <- xtabs(out.hz$weights.A~Sex+Age, data=quine.A)
freq.xy <- xtabs(out.hz$weights.A~Sex+Age+Eth, data=quine.A)
freq.xz <- xtabs(out.hz$weights.B~Sex+Age+Lrn, data=quine.B)

# apply Frechet.bounds.cat()
bounds.yz <- Frechet.bounds.cat(tab.x=freq.x, tab.xy=freq.xy,
        tab.xz=freq.xz, print.f="data.frame")
bounds.yz

Run the code above in your browser using DataLab