Learn R Programming

StatMatch (version 1.1.0)

Frechet.bounds.cat: Frechet bounds of cells in a contingency table

Description

This function permits to derive the bounds for cell probabilities of the table Y vs. Z starting from the marginal tables (X vs. Y), (X vs. Z) and the joint distribution of the X variables.

Usage

Frechet.bounds.cat(tab.x, tab.xy, tab.xz, print.f="tables", tol= 0.0001)

Arguments

tab.x
A Rtable crossing the X variables. This table must be obtained by using the function xtabs or table, e.g. tab.x <- xtabs(~x1+x2+x3, data
tab.xy
A Rtable of X vs. Y variable. This table must be obtained by using the function xtabs or table, e.g. table.xy <- xtabs(~x1+x2+x3+y, data=
tab.xz
A Rtable of X vs. Z variable. This table must be obtained by using the function xtabs or table, e.g. tab.xz <- xtabs(~x1+x2+x3+z, data=da
print.f
A string specifying the data structure of the cells' estimates. When print.f="tables" (default) all the cells' estimates will be saved as tables in a list. On the contrary, if print.f="data.frame", they will be saved as columns
tol
Tolerance used in comparing joint distributions as far as X variables are considered (default tol= 0.0001); the joint distribution of the X variables computed from tab.xy and tab.xz should be equal to t

Value

  • When print.f="tables" (default) a list with the following components:
  • low.uThe estimated lower bounds for the relative frequencies in the table Y vs. Z without conditioning on the X variables.
  • up.uThe estimated upper bounds for the relative frequencies in the table Y vs. Z without conditioning on the X variables.
  • CIAThe estimated relative frequencies in the table Y vs. Z under the Conditional Independence Assumption (CIA).
  • low.cxThe estimated lower bounds for the relative frequencies in the table Y vs. Z when conditioning on the X variables.
  • up.cxThe estimated upper bounds for the relative frequencies in the table Y vs. Z when conditioning on the X variables.
  • uncThe overall uncertainty associated to input data, estimated according to the suggestion in Conti et al. (2012) (see Fbwidths.by.x for major details).
  • When print.f="data.frame" the output list contains just two components:
  • boundsA data.frame whose columns reports the estimated uncertainty bounds.
  • uncThe overall uncertainty associated to input data, estimated according to the suggestion in Conti et al. (2012) (see Fbwidths.by.x for major details).

Details

This function permits to compute the Frechet bounds for the relative frequencies in the contingency table of Y vs. Z, starting from the distributions P(Y|X), P(Z|X) and P(X). The bounds for the relative frequencies $p_{j,k}$ in the table Y vs. Z are:

$$p^{(low)}_{Y=j,Z=k} = \sum_{i} p_{X=i}\max (0; p_{Y=j|X=i} + p_{Z=k|X=i} - 1 )$$

$$p^{(up)}_{Y=j,Z=k} = \sum_{i} p_{X=i} \min ( p_{Y=j|X=i}; p_{Z=k|X=i})$$

The relative frequencies $p_{X=i}=n_i/n$ are computed from the frequencies in tab.x; the relative frequencies $p_{Y=j|X=i}=n_{ij}/n_{i \bullet}$ are computed from the tab.xy, finally, $p_{Z=k|X=i}=n_{ik}/n_{k \bullet}$ are derived from tab.xy.

It is assumed that the marginal distribution of the X variables is the same in all the input tables: tab.x, tab.xy and tab.xz. If this is not true a warning message will appear.

Note that the cells bounds for the relative frequencies in the contingency table of Y vs. Z are computed also without considering the X variables:

$$\max{0; p_{Y=j} + p_{Z=k} - 1} \leq p_{Y=j,Z=k} \leq \min { p_{Y=j}; p_{Z=k}}$$

Finally, the contingency table of Y vs. Z estimated under the Conditional Independence Assumption (CIA) is obtained by considering:

$$p_{Y=j,Z=k} = p_{Y=j|X=i} \times p_{Z=k|X=i} \times p_{X=i}.$$

References

Ballin, M., D'Orazio, M., Di Zio, M., Scanu, M. and Torelli, N. (2009) Statistical Matching of Two Surveys with a Common Subset. Working Paper, 124. Dip. Scienze Economiche e Statistiche, Univ. di Trieste, Trieste.

Conti P.L, Marella, D., Scanu, M. (2012) Uncertainty Analysis in Statistical Matching. Journal of Official Statistics, 28, pp. 69--88.

D'Orazio, M., Di Zio, M. and Scanu, M. (2006). Statistical Matching: Theory and Practice. Wiley, Chichester.

See Also

Fbwidths.by.x, harmonize.x

Examples

Run this code
data(quine, package="MASS") #loads quine from MASS
str(quine)

# split quine in two subsets
set.seed(765)
lab.A <- sample(nrow(quine), 70, replace=TRUE)
quine.A <- quine[lab.A, 1:3]
quine.B <- quine[-lab.A, 2:4]

# compute the tables required by Frechet.bounds.cat()
freq.x <- xtabs(~Sex+Age, data=quine.A)
freq.xy <- xtabs(~Sex+Age+Eth, data=quine.A)
freq.xz <- xtabs(~Sex+Age+Lrn, data=quine.B)

# apply Frechet.bounds.cat()
bounds.yz <- Frechet.bounds.cat(tab.x=freq.x, tab.xy=freq.xy,
        tab.xz=freq.xz, print.f="data.frame")
bounds.yz

# harmonize distr. of Sex vs. Age before applying
# Frechet.bounds.cat()

quine.A$f <- 70/nrow(quine) # sampling fraction
quine.B$f <- (nrow(quine)-70)/nrow(quine)

# derive the table of Sex vs. Age related to the whole data set
tot.sex.age <- xtabs(~Sex+Age, data=quine)
tot.sex.age

# use hamonize.x() to harmonize the Sex vs. Age between
# quine.A and quine.B

# create svydesign objects
svy.qA <- svydesign(~1, fpc=~f, data=quine.A)
svy.qB <- svydesign(~1, fpc=~f, data=quine.B)

# apply harmonize.x using poststratification
out.hz <- harmonize.x(svy.A=svy.qA, svy.B=svy.qB, form.x=~Sex+Age,
          cal.method="poststratify")

# compute the new tables required by Frechet.bounds.cat()
freq.x <- xtabs(out.hz$weights.A~Sex+Age, data=quine.A)
freq.xy <- xtabs(out.hz$weights.A~Sex+Age+Eth, data=quine.A)
freq.xz <- xtabs(out.hz$weights.B~Sex+Age+Lrn, data=quine.B)

# apply Frechet.bounds.cat()
bounds.yz <- Frechet.bounds.cat(tab.x=freq.x, tab.xy=freq.xy,
        tab.xz=freq.xz, print.f="data.frame")
bounds.yz

Run the code above in your browser using DataLab