MRCV-package: Methods for Analyzing Multiple Response Categorical Variables

Description

The MRCV package provides functions for analyzing the association between one single response categorical variable (SRCV) and one multiple response categorical variable (MRCV), or between two or three MRCVs. A modified Pearson chi-square statistic can be used to test for marginal independence for the one or two MRCV case, or a more general loglinear modeling approach can be used to examine various other structures of association for the two or three MRCV case. Bootstrap- and asymptotic-based standardized residuals and model-predicted odds ratios are available, in addition to other descriptive information.

Arguments

Details

Package:

MRCV

Version:

0.3-3

Date:

2014-09-03

Depends:

R (>= 3.1.1)

Imports:

tables

Suggests:

geepack

LazyData:

TRUE

License:

GPL (>= 3)

Notation: For the two or three MRCV case, define row variable, W, column variable, Y, and strata variable, Z, as MRCVs with binary items (i.e., categories) Wi for i = 1, ..., I, Yj for j = 1, ..., J, and Zk for k = 1, ..., K, respectively. Define a marginal count as the number of subjects who responded (Wi = a, Yj = b, Zk = c) for a, b, and c belonging to the set {0, 1}. For the one MRCV case, let W be an SRCV such that I = 1 and 'a' corresponds to one of r levels of W, and let Y be the MRCV as previously defined.

Format of Data Frame: Many of the functions require a data frame containing the raw data structured such that the n rows correspond to the individual item response vectors, and the columns correspond to the items, W1, ..., WI, Y1, ..., YJ, and Z1, ..., ZK (in this order). Some of the functions use a summary version of the raw data frame (converted automatically without need for user action) formatted to have rx2J rows and 4 columns generically named W, Y, yj, and count (one MRCV case), 2Ix2J rows and 5 columns named W, Y, wi, yj, and count (two MRCV case), or 2Ix2Jx2K rows and 7 columns named W, Y, Z, wi, yj, zk, and count (three MRCV case). The column named count contains the marginal counts defined above.

Descriptive Functions: Users can call the item.response.table function to obtain a cross-tabulation of the positive and negative responses for each combination of items, or the marginal.table function to obtain a cross-tabulation of only the positive responses.

Functions to Test for Marginal Independence: Methods proposed by Agresti and Liu (1999), Bilder and Loughin (2004), Bilder, Loughin, and Nettleton (2000), and Thomas and Decady (2004) are implemented using the MI.test function. This function calculates a modified Pearson chi-square statistic that can be used to test for multiple marginal independence (MMI; one MRCV case) or simultaneous pairwise marginal independence (SPMI; two MRCV case). MMI is a test of whether the SRCV, W, is marginally independent of each Yj, where the modified statistic is the sum of the J Pearson statistics used to test for independence of each (W, Yj) pair. SPMI is a test of whether each Wi is pairwise independent of each Yj, where the modified statistic is the sum of the IxJ Pearson statistics used to test for independence of each (Wi, Yj) pair. The asymptotic distribution of the modified statistics is a linear combination of independent chi-square(1) random variables, so traditional methods for analyzing the association between categorical variables W and Y are inappropriate. The MI.test function offers three sets of testing methods: a nonparametric bootstrap approach, a Rao-Scott second-order adjustment, and a Bonferroni adjustment, that can be used in conjunction with the modified statistic to construct an appropriate test for independence.

Functions for Performing Regression Modeling: Regression modeling methods described by Bilder and Loughin (2007) are implemented using genloglin and methods summary.genloglin, residuals.genloglin, anova.genloglin, and predict.genloglin. The genloglin function provides parameter estimates and Rao-Scott adjusted standard errors for models involving two or three MRCVs. The anova.genloglin function offers second-order Rao-Scott and bootstrap adjusted model comparison and goodness-of-fit (Pearson and LRT) statistics. The residuals.genloglin and predict.genloglin functions provide bootstrap- and asymptotic-based standardized Pearson residuals and model-based odds ratios, respectively.

General Notes: Rao-Scott adjustments may not be feasible when the total number of MRCV items is large. In this case, an error message will be returned describing a memory allocation issue.

References

Agresti, A. and Liu, I.-M. (1999) Modeling a categorical variable allowing arbitrarily many category choices. Biometrics, 55, 936--943.

Bilder, C. and Loughin, T. (2004) Testing for marginal independence between two categorical variables with multiple responses. Biometrics, 36, 433--451.

Bilder, C. and Loughin, T. (2007) Modeling association between two or more categorical variables that allow for multiple category choices. Communications in Statistics--Theory and Methods, 36, 433--451.

Bilder, C., Loughin, T., and Nettleton, D. (2000) Multiple marginal independence testing for pick any/c variables. Communications in Statistics--Theory and Methods, 29, 1285--1316.

Thomas, D. and Decady, Y. (2004) Testing for association using multiple response survey data: Approximate procedures based on the Rao-Scott approach. International Journal of Testing, 4, 43--59.