MI.test: Test for Marginal Independence

Description

The MI.test function offers three approaches for testing multiple marginal independence (MMI) between one SRCV and one MRCV, or simultaneous pairwise marginal independence (SPMI) between two MRCVs.

Usage

MI.test(data, I, J, type = "all", B = 1999, B.max = B, summary.data =  FALSE, add.constant = 0.5, plot.hist = FALSE, print.status = TRUE) 
MI.stat(data, I, J, summary.data = FALSE, add.constant = 0.5)

Arguments

data

For summary.data = FALSE: a data frame containing the raw data where rows correspond to the individual item response vectors, and columns correspond to the items, W1, ..., WI and Y1, ..., YJ (in this order).

For summary.data = TRUE: a data frame containing 4 columns generically named W, Y, yj, and count (one MRCV case), or 5 columns named W, Y, wi, yj, and count (two MRCV case).

The number of items corresponding to row variable W. I = 1 for the one MRCV case.

The number of items corresponding to column variable Y.

type

A character string specifying one of the following approaches for testing for MI: "boot" specifies a nonparametric bootstrap procedure; "rs2" specifies a Rao-Scott second-order adjustment; "bon" specifies a Bonferroni adjustment; "all" specifies all three approaches.

The desired number of bootstrap resamples.

B.max

The maximum number of bootstrap resamples. A resample is thrown out if at least one of the J (one MRCV case) or IxJ (two MRCV case) contingency tables does not have the correct dimension; MI.test uses the first B valid resamples or all valid resamples if that number is less than B.

summary.data

A logical value indicating whether data is a summary file containing the item response data instead of the raw data. Only type = "bon" is available for summary.data = TRUE.

add.constant

A positive constant to be added to all zero marginal cell counts.

plot.hist

A logical value indicating whether plots of the emprical bootstrap sampling distributions should be provided.

print.status

A logical value indicating whether bootstrap progress updates should be provided.

Value

--- MI.test returns a list containing at least general, a list containing the following objects:

data: The original data frame supplied to the data argument.
I: The original value supplied to the I argument.
J: The original value supplied to the J argument.
summary.data: The original value supplied to the summary.data argument.
X.sq.S: The modified Pearson statistic; NA if at least one of the J (one MRCV case) or IxJ (two MRCV case) contingency tables does not have the correct dimension.
X.sq.S.ij: A matrix containing the individual Pearson statistics.

--- For type = "boot", the primary list additionally includes boot, a list containing the following objects:

B.use: The number of bootstrap resamples used.
B.discard: The number of bootstrap resamples discarded due to having at least one contingency table with incorrect dimension.
p.value.boot: The bootstrap p-value for the test of MMI or SPMI.
p.combo.min.boot: The bootstrap p-value for the minimum p-value combination method.
p.combo.prod.boot: The bootstrap p-value for the product p-value combination method.
X.sq.S.star: A numeric vector containing the modified Pearson statistics calculated for each resample.
X.sq.S.ij.star: A matrix containing the individual Pearson statistics calculated for each resample.
p.combo.min.star: A numeric vector containing the minimum p-value calculated for each resample.
p.combo.prod.star: A numeric vector containing the product p-value calculated for each resample.

--- For type = "rs2", the primary list additionally includes rs2, a list containing the following objects:

X.sq.S.rs2: The Rao-Scott second-order adjusted Pearson statistic.
df.rs2: The degrees of freedom for testing the second-order Rao-Scott adjusted Pearson statistic.
p.value.rs2: The p-value based on the Rao-Scott second-order adjustment.

--- For type = "bon", the primary list additionally includes bon, a list containing the following objects:

p.value.bon: The Bonferroni adjusted p-value for the test of MMI or SPMI.
X.sq.S.ij.p.bon: A matrix containing the Bonferroni adjusted p-values for the individual Pearson statistics.

--- For type = "all", the list includes all of the above objects.--- MI.stat returns a list containing the following objects:

X.sq.S: Defined above.
X.sq.S.ij: Defined above.
valid.margins: The number of contingency tables with correct dimension.

Details

The MI.test function calls MI.stat to calculate a modified Pearson statistic (see Bilder, Loughin, and Nettleton (2000) and Bilder and Loughin (2004)), and then performs the testing of MMI or SPMI. Three sets of testing methods are implemented:

The nonparametric bootstrap resamples under the null hypothesis by independently sampling the W and Y vectors with replacement from the observed data. Fixed row counts (i.e., fixed counts for each level of the SRCV) are maintained for the one MRCV case. A modified Pearson statistic is calculated for each resample. In addition, bootstrap p-value combination methods are available to take advantage of the modified Pearson statistic's decomposition into J (one MRCV case) or IxJ (two MRCV case) individual Pearson statistics. The minimum or the product of p-values is the combination for each resample.
The Rao-Scott approach applies a second-order adjustment to the modified Pearson statistic and its sampling distribution. Formulas are provided in Appendix A of Bilder, Loughin, and Nettleton (2000) and Bilder and Loughin (2004). Note that this test can be conservative at times.
The Bonferroni adjustment multiplies each p-value (using a standard chi-square approximation) from the individual Pearson statistics by J (one MRCV case) or IxJ (two MRCV case). If a resulting p-value is greater than 1 after the multiplication, it is set to a value of 1. The overall p-value for the test then is the minimum of these adjusted p-values. Note that the Bonferroni adjustment tends to produce an overly conservative test when the number of individual Pearson statistics is large.

Agresti and Liu (1999) discuss a marginal logit model approach that uses generalized estimation equations (GEE) to test for MMI. As shown in the example given below, this approach can be performed via functions available from the geepack package. However, Bilder, Loughin, and Nettleton (2000) caution that the Wald test produced by this approach does not hold the correct size, particularly when the sample size is not large and marginal probabilities are small.

References

Agresti, A. and Liu, I.-M. (1999) Modeling a categorical variable allowing arbitrarily many category choices. Biometrics, 55, 936--943.

Bilder, C. and Loughin, T. (2004) Testing for marginal independence between two categorical variables with multiple responses. Biometrics, 36, 433--451.

Bilder, C., Loughin, T., and Nettleton, D. (2000) Multiple marginal independence testing for pick any/c variables. Communications in Statistics--Theory and Methods, 29, 1285--1316.

Examples

Run this code

# Test for MMI using the second-order Rao-Scott adjustment
test.mmi.rs2 <- MI.test(data = farmer1, I = 1, J = 5, type = "rs2")
test.mmi.rs2

# Test for MMI using all three approaches
# A small B is used for demonstration purposes; normally, a larger B should be used
## Not run: 
# test.mmi.all <- MI.test(data = farmer1, I = 1, J = 5, type = "all", B = 99, 
#     plot.hist = TRUE)
# test.mmi.all## End(Not run)

# Use MI.test() with summary data
# Convert raw data file to summary file for this example 
farmer1.irdframe <- item.response.table(data = farmer1, I = 1, J = 5, create.dataframe = 
    TRUE)
# Test for MMI using the Bonferroni adjustment
test.mmi.bon <- MI.test(data = farmer1.irdframe, I = 1, J = 5, type = "bon", 
    summary.data = TRUE)
test.mmi.bon

# Test for SPMI using the second-order Rao-Scott adjustment
test.spmi.rs2 <- MI.test(data = farmer2, I = 3, J = 4, type = "rs2")
test.spmi.rs2

# Test for MMI using the marginal logit model approach
## Not run: 
# library(geepack)
# n<-nrow(farmer1)
# farmer1.id<-cbind(case=1:n, farmer1)
# # Reshape raw data into long format as required by geeglm() function
# # Assumes 3:ncol(farmer1.id) corresponds to MRCV items
# farmer1.gee<-reshape(data = farmer1.id, 
#                  varying = names(farmer1.id)[3:ncol(farmer1.id)], 
#                  v.names = "response", timevar = "item", idvar = "case", 
#                  direction = "long") 
# row.names(farmer1.gee)<-NULL
# farmer1.gee[,2:3]<-lapply(farmer1.gee[,2:3], factor)
# # Data frame must be ordered by case
# farmer1.gee<-farmer1.gee[order(farmer1.gee$case),]
# head(farmer1.gee)
# tail(farmer1.gee)
# mod.fit.H0<-geeglm(formula = response ~ item, family = binomial(link = logit), 
#                   data = farmer1.gee, na.action = na.omit, id = case, 
#                   corstr = "unstructured")
# mod.fit.HA<-geeglm(formula = response ~ Ed*item, family = binomial(link = logit), 
#                    data = farmer1.gee, na.action = na.omit, id = case, 
#                    corstr = "unstructured")
# # Compute Wald test
# anova(mod.fit.HA, mod.fit.H0)## End(Not run)

Run the code above in your browser using DataLab