Learn R Programming

corregp (version 0.1.2)

corregp: Correspondence Regression

Description

This is the basic function for correspondence regression, i.e. the correspondence analysis of a contingency table formed by the categorical variables Y and X, where X can be in turn made up of the combinations of various categorical variables.

Usage

corregp(formula, data, part = NULL, b = 0, xep = TRUE, std = FALSE,
  rel = TRUE, phi = FALSE, chr = ".")

Arguments

formula
A formula specification of which factors to cross with each other. The left-hand (y) side must be a single factor. The right-hand side (x) can involve all the usual specifica
data
The data frame containing the variables specified in the formula.
part
The name of a factor partitioning the levels of the left-hand side y into groups. This argument is relevant for analyses in which one wants to remove between-item variation.
b
Number of the bootstrap replications. If 0 (i.e. the default), then the analysis is exploratory.
xep
Logical specifying whether to output the separate terms in the right-hand side (x) as components in a list. If FALSE, then all x output is collected in a matrix.
std
Logical specifying whether to output the standardized coordinates. Defaults to FALSE.
rel
Logical specifying whether to divide the coordinates by the sqrt of their totals, so that one obtains coordinates for the relative frequencies (as is customary in correspondence analysis). Defaults to TRUE.
phi
Logical specifying whether to compute the output on the scale of the Chi-squared value of the contingency table or of the Phi-squared value (which is Chi-squared divided by N). Reminiscent of
chr
Character specifying the separator string for constructing the interaction terms.

Value

  • An object of class "corregp", i.e. a list with components:
  • eigenA vector of eigenvalues of the correpondence regression.
  • yThe coordinates (matrix) of the Y levels.
  • xThe coordinates of the X levels. If xep is TRUE, then this is a list with a component for each term name.
  • freqA list of the frequencies of every Y and X level.
  • confIf $b>0$. A list of bootstrap replicates for the eigenvalues, the Y levels and the X levels.
  • auxA list of auxiliary information (such as the U and V matrices of the SVD, the specified values for all the arguments) to be passed to other functions and methods.

Details

Correspondence regression rests on the idea, described by Van der Heijden et al. (1989) and quoted in Greenacre (2007: 272), of using correspondence analysis to inspect the interactions in a log-linear analysis. More specifically, as log-linear analysis or Poisson regression is sometimes used to model a polytomous or multinomial response variable (in a GLM), correspondence regression enables the analysis of a categorical factor (Y) in terms of other (possibly interacting) factors (X). These are specified in the argument formula, which can be constructed in all the usual ways of specifying a model formula: e.g. Y ~ X1 * X2 as a shorthand for Y ~ X1 + X2 + X1 : X2, or Y ~ X1 * X2 - X1 : X2, Y ~ (X1 + X2 + X3) ^ 2, etc. Correspondence regression then crosstabulates the Y factor with all the combinations in X, thus producing a typical contingency table, on which a simple correspondence analysis is performed (see Greenacre 2007: 121-128 for the outline of this approach). The more general effects in X are obtained by aggregating the combinations. Correspondence regression also allows for inferential validation of the effects, which is done by means of the bootstrap. Setting the argument b to a number $> 0$, b replicates of the contingency table are generated with multinomial sampling. From these, b new values are derived for the coordinates in both Y and X as well as for the eigenvalues (also called the "principal inertias"). On the basis of the replicate values, confidence intervals, ellipses or ellipsoids can be computed. CAUTION: bootstrapping is computationally quite intensive, so it can take a while to reach results, especially with a large b. The argument parm can be used when the levels of Y are grouped/partitioned/nested into clusters and one wants to exclude the heterogeneity between the clusters. Thus, parm is equivalent to a random factor, although corregp currently allows for only one such factor. The use of parm can be relevant for so-called lectometric analyses in linguistics.

References

Greenacre, M. (2007) Correspondence analysis in practice, Second edition. Boca Raton: Chapman and Hall/CRC. Van der Heijden, P.G.M., A. de Falguerolles and J. de Leeuw (1989) A combined approach to contingency table analysis using correspondence analysis and log-linear analysis. Applied Statistics 38 (2), 249--292.

See Also

print.corregp, summary.corregp, screeplot.corregp, plot.corregp.

Examples

Run this code
data(HairEye)
haireye.crg <- corregp(Eye ~ Hair * Sex, data = HairEye, b = 3000)
haireye.crg

Run the code above in your browser using DataLab