Fbwidths.by.x: Computes the Frechet bounds of cells in a contingency table by considering all the possible subsets of the common variables.

Description

This function permits to compute the bounds for cell probabilities in the contingency table Y vs. Z starting from the marginal tables (X vs. Y), (X vs. Z) and the joint distribution of the X variables, by considering all the possible subsets of the X variables. In this manner it is possible to identify which subset of the X variables produces the major reduction of the uncertainty.

Usage

Fbwidths.by.x(tab.x, tab.xy, tab.xz)

Arguments

tab.x

A Rtable crossing the X variables. This table must be obtained by using the function xtabs or table, e.g. tab.x <- xtabs(~x1+x2+x3, data=

tab.xy

A Rtable of X vs. Y variable. This table must be obtained by using the function xtabs or table, e.g. table.xy <- xtabs(~x1+x2+x3+y, data=

tab.xz

A Rtable of X vs. Z variable. This table must be obtained by using the function xtabs or table, e.g. tab.xz <- xtabs(~x1+x2+x3+z, data=da

Value

A list with the estimated estimated bounds for the cells in the table of Y vs. Z for each possible subset of the X variables. The final component sum.unc is a data.frame that summarizes the findings for each subset of the X variables and measures of the uncertainty are reported. In particular the data frame reports the no. of X variables ("x.vars"), the number of cells in the joint distribution of the X variables ("x.cells"), the number of cells in joint distribution of the X variables with frequency equal to 0 ("x.freq0"), the average widths of the uncertainty intervals ("av.widths").

Details

This function permits to compute the Frechet bounds for the frequencies in the contingency table of Y vs. Z, starting from the conditional distributions P(Y|X) and P(Z|X) (for details see Frechet.bounds.cat), by considering all the possible subsets of the X variables. In this manner it is possible to identify the subset of the X variables, with highest association with both Y and Z, that permits to reduce the uncertainty concerning the distribution of Y vs. Z. The uncertainty is measured by the average of the widths of the bounds for the cells in the table of Y vs. Z it is also reported: $$\bar{d} = \frac{1}{J \times K} \sum_{j,k} ( p^{(up)}_{Y=j,Z=k} - p^{(low)}_{Y=j,Z=k} )$$ For details see Frechet.bounds.cat.

References

Ballin, M., D'Orazio, M., Di Zio, M., Scanu, M. and Torelli, N. (2009) Statistical Matching of Two Surveys with a Common Subset. Working Paper, 124. Dip. Scienze Economiche e Statistiche, Univ. di Trieste, Trieste. D'Orazio, M., Di Zio, M. and Scanu, M. (2006). Statistical Matching: Theory and Practice. Wiley, Chichester.

Examples

Run this code

data(quine, package="MASS") #loads quine from MASS
str(quine)
quine$c.Days <- cut(quine$Days, c(-1, seq(0,50,10),100))
table(quine$c.Days)


# split quine in two subsets
set.seed(4567)
lab.A <- sample(nrow(quine), 70, replace=TRUE)
quine.A <- quine[lab.A, 1:4]
quine.B <- quine[-lab.A, c(1:3,6)]

# compute the tables required by Fbwidths.by.x()
freq.x <- xtabs(~Eth+Sex+Age, data=quine.A)
freq.xy <- xtabs(~Eth+Sex+Age+Lrn, data=quine.A)
freq.xz <- xtabs(~Eth+Sex+Age+c.Days, data=quine.B)

# apply Fbwidths.by.x()
bounds.yz <- Fbwidths.by.x(tab.x=freq.x, tab.xy=freq.xy,
        tab.xz=freq.xz)

bounds.yz$sum.unc

Run the code above in your browser using DataLab