Codings: Coding matrix functions for factors in linear model formulae

Description

These functions provide an alternative to the coding functions supplied in the stats package, namely contr.treatment, contr.sum, contr.helmert and contr.poly.

Usage

code_control(n, contrasts = TRUE, sparse = FALSE)
code_control_last(n, contrasts = TRUE, sparse = FALSE)
code_diff(n, contrasts = TRUE, sparse = FALSE)
code_diff_forward(n, contrasts = TRUE, sparse = FALSE)
code_helmert(n, contrasts = TRUE, sparse = FALSE)
code_helmert_forward(n, contrasts = TRUE, sparse = FALSE)
code_deviation(n, contrasts = TRUE, sparse = FALSE)
code_deviation_first(n, contrasts = TRUE, sparse = FALSE)
code_poly(n, contrasts = TRUE, sparse = FALSE)
contr.diff(n, contrasts = TRUE, sparse = FALSE)

Arguments

Either a positive integer giving the number of levels or the levels attribute of a factor, supplying both the number of levels via its length and labels potentially to be used in the dimnames of the result.

contrasts

Logical: Do you want the \(n \times (n-1)\) coding matrix (TRUE) or an \(n \times n\) full-rank matrix, (as is sometimes needed by the fitting functions) (FALSE)?

sparse

Logical: Do you want the result to be a sparse matrix object, as generated the the Matrix package?

Value

A coding matrix, as requested by fitting functions using linear model formulae with factor predictors.

Details

All functions with names of the form code_xxxx return coding matrices which, in a simple model, make the intercept term the simple ("unweighted") average of the class means. This can be important in some non-standard ANOVA tables. The function contr.diff is an exception, and is offered as a natural companion to stats::contr.treatment, with which it is closely aligned.

code_control: Similar to contr.treatment, with contrasts comparing the class means (the "treatments") with the first class mean (the "control").
code_control_last: Similar to code_control, but using the final class mean as the "control". Cf. contr.SAS
code_diff: The contrasts are the successive differences of the treatment means, \(\mu_{i+i} - \mu_i\). This coding function has no counterpart in the stats package. It is suggested as an alternative to the default coding, contr.poly, for ordered factors. It offers a visual check of monotonicity of the class means with the ordered levels of the factor. Unlike stats::contr.poly there is no assumption that the factor levels are in some sense "equally spaced".
code_diff_forward: Very similar to code_diff, but using forward differences: \(\mu_i - \mu_{i+1}\)
code_helmert: Similar to contr.helmert, but with a small scaling change to make the regression coefficients (i.e. the contrasts) more easily interpretable. The contrasts now compare each class mean, starting from the second, with the average of all class means coming prior to it in the factor levels order.
code_helmert_forward: Similar to code_helmert, but comparing each class mean, up to the second last, with the average of all class means coming after it in the factor levels order.
code_deviation: Similar to contr.sum, which is described as having the "effects" summing to zero. A more precise description might be to say that the contrasts are the deviations of each class mean from the average of them, i.e. \(\mu_i - \bar\mu\). To avoid redundancy, the last deviation is omitted.
code_deviation_first: Very similar to code_deviation, but omitting the first deviation to avoid redundancy rather than the last.
code_poly: Similar in effect to contr.poly but for levels fewer than 15 using an unnormalized basis for the orthogonal polynomials with integer entries. (Orthogonal polynomials were originally given in this form as tables.) The only advantage over stats::contr.poly is one of display. Use stats::contr.poly in preference other than for teaching purposes.
contr.diff: Very similar in effect to code_diff, yielding the same differences as the contrasts, but like stats::contr.treatment using the first class mean as the intercept coefficient rather than the simple average of the class means, as with code_diff. Some would regard this as making it unsuitable for use in some non-standard ANOVA tables.

Examples

Run this code

# NOT RUN {
(M <- code_control(5))
mean_contrasts(M)
(M <- stats::contr.treatment(5))
mean_contrasts(M)  ## same contrasts; different averaging vector.
mean_contrasts(stats::contr.helmert(6))  ## Interpretation obscure
mean_contrasts(code_helmert(6))          ## each mean with the average preceding
mean_contrasts(code_helmert_forward(6))  ## each mean with the averave succeeding
# }