assoc
Extended Association Plots
Produce an association plot indicating deviations from a specified independence model in a possibly high-dimensional contingency table.
- Keywords
- hplot
Usage
# S3 method for default
assoc(x, row_vars = NULL, col_vars = NULL, compress = TRUE,
xlim = NULL, ylim = NULL,
spacing = spacing_conditional(sp = 0), spacing_args = list(),
split_vertical = NULL, keep_aspect_ratio = FALSE,
xscale = 0.9, yspace = unit(0.5, "lines"), main = NULL, sub = NULL,
…, residuals_type = "Pearson", gp_axis = gpar(lty = 3))
# S3 method for formula
assoc(formula, data = NULL, …, subset = NULL, na.action = NULL, main = NULL, sub = NULL)
Arguments
- x
a contingency table in array form with optional category labels specified in the
dimnames(x)
attribute, or an object inheriting from the"ftable"
class (such as"structable"
objects).- row_vars
a vector of integers giving the indices, or a character vector giving the names of the variables to be used for the rows of the association plot.
- col_vars
a vector of integers giving the indices, or a character vector giving the names of the variables to be used for the columns of the association plot.
- compress
logical; if
FALSE
, the space between the rows (columns) are chosen such that the total heights (widths) of the rows (columns) are all equal. IfTRUE
, the space between rows and columns is fixed and hence the plot is more “compressed”.- xlim
a \(2 \times k\) matrix of doubles, \(k\) number of total columns of the plot. The columns of
xlim
correspond to the columns of the association plot, the rows describe the column ranges (minimums in the first row, maximums in the second row). Ifxlim
isNULL
, the ranges are determined from the residuals according tocompress
(ifTRUE
: widest range from each column, ifFALSE
: from the whole association plot matrix).- ylim
a \(2 \times k\) matrix of doubles, \(k\) number of total rows of the plot. The columns of
ylim
correspond to the rows of the association plot, the rows describe the column ranges (minimums in the first row, maximums in the second row). Ifylim
isNULL
, the ranges are determined from the residuals according tocompress
(ifTRUE
: widest range from each row, ifFALSE
: from the whole association plot matrix).- spacing
a spacing object, a spacing function, or a corresponding generating function (see
strucplot
for more information). The default is the spacing-generating functionspacing_conditional
that is (by default) called with the argument listspacing_args
(seespacings
for more details).- spacing_args
list of arguments for the spacing-generating function, if specified (see
strucplot
for more information).- split_vertical
vector of logicals of length \(k\), where \(k\) is the number of margins of
x
(default:FALSE
). Values are recycled as needed. ATRUE
component indicates that the corresponding dimension is folded into the columns,FALSE
folds the dimension into the rows.- keep_aspect_ratio
logical indicating whether the aspect ratio should be fixed or not.
- residuals_type
a character string indicating the type of residuals to be computed. Currently, only Pearson residuals are supported.
- xscale
scale factor resizing the tile's width, thus adding additional space between the tiles.
- yspace
object of class
"unit"
specifying additional space separating the rows.- gp_axis
object of class
"gpar"
specifying the visual aspects of the tiles' baseline.- formula
a formula object with possibly both left and right hand sides specifying the column and row variables of the flat table.
- data
a data frame, list or environment containing the variables to be cross-tabulated, or an object inheriting from class
table
.- subset
an optional vector specifying a subset of observations to be used. Ignored if
data
is a contingency table.- na.action
an optional function which indicates what should happen when the data contain
NA
s. Ignored ifdata
is a contingency table.- main, sub
either a logical, or a character string used for plotting the main (sub) title. If logical and
TRUE
, the name of thedata
object is used.- …
other parameters passed to
strucplot
Details
Association plots have been suggested by Cohen (1980) and extended by Friendly (1992) and provide a means for visualizing the residuals of an independence model for a contingency table.
assoc
is a generic function and currently has a default method and a
formula interface. Both are high-level interfaces to the
strucplot
function, and produce (extended) association
plots. Most of the functionality is described there, such as
specification of the independence model, labeling, legend, spacing,
shading, and other graphical parameters.
For a contingency table, the signed contribution to Pearson's \(\chi^2\) for cell \(\{ij\ldots k\}\) is
$$d_{ij\ldots k} = \frac{(f_{ij\ldots k} - e_{ij\ldots k})}{ \sqrt{e_{ij\ldots k}}}$$
where \(f_{ij\ldots k}\) and \(e_{ij\ldots k}\) are the observed and expected counts corresponding to the cell. In the association plot, each cell is represented by a rectangle that has (signed) height proportional to \(d_{ij\ldots k}\) and width proportional to \(\sqrt{e_{ij\ldots k}}\), so that the area of the box is proportional to the difference in observed and expected frequencies. The rectangles in each row are positioned relative to a baseline indicating independence (\(d_{ij\ldots k} = 0\)). If the observed frequency of a cell is greater than the expected one, the box rises above the baseline, and falls below otherwise.
Additionally, the residuals can be colored depending on a specified shading scheme (see Meyer et al., 2003). Package vcd offers a range of residual-based shadings (see the shadings help page). Some of them allow, e.g., the visualization of test statistics.
Unlike the assocplot
function in the
graphics package, this function allows the visualization of
contingency tables with more than two dimensions. Similar to the
construction of ‘flat’ tables (like objects of class "ftable"
or
"structable"
), the dimensions are folded into rows and columns.
The layout is very flexible: the specification of shading, labeling,
spacing, and legend is modularized (see strucplot
for
details).
Value
The "structable"
visualized is returned invisibly.
References
Cohen, A. (1980), On the graphical display of the significant components in a two-way contingency table. Communications in Statistics---Theory and Methods, A9, 1025--1041.
Friendly, M. (1992), Graphical methods for categorical data. SAS User Group International Conference Proceedings, 17, 190--200. http://datavis.ca/papers/sugi/sugi17.pdf
Meyer, D., Zeileis, A., Hornik, K. (2003), Visualizing independence using extended association plots. Proceedings of the 3rd International Workshop on Distributed Statistical Computing, K. Hornik, F. Leisch, A. Zeileis (eds.), ISSN 1609-395X. https://www.R-project.org/conferences/DSC-2003/Proceedings/
Meyer, D., Zeileis, A., and Hornik, K. (2006),
The strucplot framework: Visualizing multi-way contingency tables with
vcd.
Journal of Statistical Software, 17(3), 1-48.
URL http://www.jstatsoft.org/v17/i03/ and available as
vignette("strucplot")
.
See Also
Examples
library(vcd)
# NOT RUN {
data("HairEyeColor")
## Aggregate over sex:
(x <- margin.table(HairEyeColor, c(1, 2)))
## Ordinary assocplot:
assoc(x)
## and with residual-based shading (of independence)
assoc(x, main = "Relation between hair and eye color", shade = TRUE)
## Aggregate over Eye color:
(x <- margin.table(HairEyeColor, c(1, 3)))
chisq.test(x)
assoc(x, main = "Relation between hair color and sex", shade = TRUE)
# Visualize multi-way table
assoc(aperm(HairEyeColor), expected = ~ (Hair + Eye) * Sex,
labeling_args = list(just_labels = c(Eye = "left"),
offset_labels = c(right = -0.5),
offset_varnames = c(right = 1.2),
rot_labels = c(right = 0),
tl_varnames = c(Eye = TRUE))
)
assoc(aperm(UCBAdmissions), expected = ~ (Admit + Gender) * Dept, compress = FALSE,
labeling_args = list(abbreviate = c(Gender = TRUE), rot_labels = 0)
)
# }