Learn R Programming

StatMatch (version 1.1.0)

Fbwidths.by.x: Computes the Frechet bounds of cells in a contingency table by considering all the possible subsets of the common variables.

Description

This function permits to compute the bounds for cell probabilities in the contingency table Y vs. Z starting from the marginal tables (X vs. Y), (X vs. Z) and the joint distribution of the X variables, by considering all the possible subsets of the X variables. In this manner it is possible to identify which subset of the X variables produces the major reduction of the overall uncertainty.

Usage

Fbwidths.by.x(tab.x, tab.xy, tab.xz)

Arguments

tab.x
A Rtable crossing the X variables. This table must be obtained by using the function xtabs or table, e.g. tab.x <- xtabs(~x1+x2+x3, data=
tab.xy
A Rtable of X vs. Y variable. This table must be obtained by using the function xtabs or table, e.g. table.xy <- xtabs(~x1+x2+x3+y, data=
tab.xz
A Rtable of X vs. Z variable. This table must be obtained by using the function xtabs or table, e.g. tab.xz <- xtabs(~x1+x2+x3+z, data=da

Value

  • A list with the estimated estimated bounds for the cells in the table of Y vs. Z for each possible subset of the X variables. The final component of the list , av.widths, is a data.frame that summarizes the findings for each subset of the X variables and a measure of the overall uncertainty is reported. In particular the data frame reports the no. of X variables, the number of cells in the joint distribution of the X variables, the number of cells in joint distribution of the X variables the with relative frequency equal to 0, the average widths of the uncertainty intervals and finally the overall uncertainty.

Details

This function permits to compute the Frechet bounds for the frequencies in the contingency table of Y vs. Z, starting from the conditional distributions P(Y|X) and P(Z|X) (for details see Frechet.bounds.cat), by considering all the possible subsets of the X variables. In this manner it is possible to identify the subset of the X variables, with highest association with both Y and Z, that permits to reduce the uncertainty concerning the distribution of Y vs. Z. The overall uncertainty is measured by considering the suggestion in Conti et al. (2012): $$\Delta = sum_{i,j,k} ( p^{(up)}_{Y=j,Z=k} - p^{(low)}_{Y=j,Z=k} ) p_{Y=j|X=i} p_{Z=k|X=i} p_{X=i}$$ in addition, the average of the widths of the bounds for the cells in the table of Y vs. Z it is also reported: $$\bar{d} = \frac{1}{J \times K} \sum_{j,k} ( p^{(up)}_{Y=j,Z=k} - p^{(low)}_{Y=j,Z=k} )$$ For details see Frechet.bounds.cat.

References

Ballin, M., D'Orazio, M., Di Zio, M., Scanu, M. and Torelli, N. (2009) Statistical Matching of Two Surveys with a Common Subset. Working Paper, 124. Dip. Scienze Economiche e Statistiche, Univ. di Trieste, Trieste. Conti P.L, Marella, D., Scanu, M. (2012) Uncertainty Analysis in Statistical Matching. Journal of Official Statistics, 28, pp. 69--88. D'Orazio, M., Di Zio, M. and Scanu, M. (2006). Statistical Matching: Theory and Practice. Wiley, Chichester.

See Also

Frechet.bounds.cat, harmonize.x

Examples

Run this code
data(quine, package="MASS") #loads quine from MASS
str(quine)
quine$c.Days <- cut(quine$Days, c(-1, seq(0,50,10),100))
table(quine$c.Days)


# split quine in two subsets
set.seed(4567)
lab.A <- sample(nrow(quine), 70, replace=TRUE)
quine.A <- quine[lab.A, 1:4]
quine.B <- quine[-lab.A, c(1:3,6)]

# compute the tables required by Fbwidths.by.x()
freq.x <- xtabs(~Eth+Sex+Age, data=quine.A)
freq.xy <- xtabs(~Eth+Sex+Age+Lrn, data=quine.A)
freq.xz <- xtabs(~Eth+Sex+Age+c.Days, data=quine.B)

# apply Fbwidths.by.x()
bounds.yz <- Fbwidths.by.x(tab.x=freq.x, tab.xy=freq.xy,
        tab.xz=freq.xz)
bounds.yz$av.widths
#
# to view a plot of the av.widths run also the
#following code
#barplot(bounds.yz$av.widths$ov.unc,
#         names.arg=row.names(bounds.yz$av.widths), las=2,
#         cex.names=0.75)

Run the code above in your browser using DataLab