Construct Design Matrices
model.matrix creates a design (or model) matrix.
model.matrix(object, ...)"model.matrix"(object, data = environment(object), contrasts.arg = NULL, xlev = NULL, ...)
- an object of an appropriate class. For the default
method, a model formula or a
- a data frame created with
model.frame. If another sort of object,
model.frameis called first.
- A list, whose entries are values (numeric
matrices or character strings naming functions) to be used
as replacement values for the
contrastsreplacement function and whose names are the names of columns of
- to be used as argument of
datais such that
- further arguments passed to or from other methods.
model.matrix creates a design matrix from the description
terms(object), using the data in
must supply variables with the same names as would be created by a
model.frame(object) or, more precisely, by evaluating
attr(terms(object), "variables"). If
data is a data
frame, there may be other columns and the order of columns is not
important. Any character variables are coerced to factors. After
coercion, all the variables used on the right-hand side of the
formula must be logical, integer, numeric or factor.
In an interaction term, the variable whose levels vary fastest is the
first one to appear in the formula (and not in the term), so in
~ a + b + b:a the interaction will have
By convention, if the response variable also appears on the right-hand side of the formula it is dropped (with a warning), although interactions involving the term are retained.
The design matrix for a regression-like model with the specified formula
and data.There is an attribute
"assign", an integer vector with an entry for each column in the matrix giving the term in the formula which gave rise to the column. Value
0corresponds to the intercept (if any), and positive values to terms in the order given by the
term.labelsattribute of the
termsstructure corresponding to
object.If there are any factors in terms in the model, there is an attribute
"contrasts", a named list with an entry for each factor. This specifies the contrasts that would be used in terms in which the factor is coded by contrasts (in some terms dummy coding may be used), either as a character vector naming a function or as a numeric matrix.
Chambers, J. M. (1992) Data for models. Chapter 3 of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.
ff <- log(Volume) ~ log(Height) + log(Girth) utils::str(m <- model.frame(ff, trees)) mat <- model.matrix(ff, m) dd <- data.frame(a = gl(3,4), b = gl(4,1,12)) # balanced 2-way options("contrasts") model.matrix(~ a + b, dd) model.matrix(~ a + b, dd, contrasts = list(a = "contr.sum")) model.matrix(~ a + b, dd, contrasts = list(a = "contr.sum", b = "contr.poly")) m.orth <- model.matrix(~a+b, dd, contrasts = list(a = "contr.helmert")) crossprod(m.orth) # m.orth is ALMOST orthogonal