`model.frame`

(a generic function) and its methods return a
`data.frame`

with the variables needed to use
`formula`

and any `…`

arguments.

`model.frame(formula, …)`# S3 method for default
model.frame(formula, data = NULL,
subset = NULL, na.action = na.fail,
drop.unused.levels = FALSE, xlev = NULL, …)

# S3 method for aovlist
model.frame(formula, data = NULL, …)

# S3 method for glm
model.frame(formula, …)

# S3 method for lm
model.frame(formula, …)

get_all_vars(formula, data, …)

data

a data.frame, list or environment (or object
coercible by `as.data.frame`

to a data.frame),
containing the variables in `formula`

. Neither a matrix nor an
array will be accepted.

subset

a specification of the rows to be used: defaults to all
rows. This can be any valid indexing vector (see
`[.data.frame`

) for the rows of `data`

or if that is not
supplied, a data frame made up of the variables used in `formula`

.

na.action

drop.unused.levels

should factors have unused levels dropped?
Defaults to `FALSE`

.

xlev

a named list of character vectors giving the full set of levels to be assumed for each factor.

…

for `model.frame`

methods, a mix of further
arguments such as `data`

, `na.action`

, `subset`

to pass
to the default method. Any additional arguments (such as
`offset`

and `weights`

or other named arguments) which
reach the default method are used to create further columns in the
model frame, with parenthesised names such as `"(offset)"`

.

For `get_all_vars`

, further named columns to include
in the model frame.

A `data.frame`

containing the variables used in
`formula`

plus those specified in `…`

. It will have
additional attributes, including `"terms"`

for an object of class
`"terms"`

derived from `formula`

,
and possibly `"na.action"`

giving information on the handling of
`NA`

s (which will not be present if no special handling was done,
e.g.by `na.pass`

).

Exactly what happens depends on the class and attributes of the object
`formula`

. If this is an object of fitted-model class such as
`"lm"`

, the method will either return the saved model frame
used when fitting the model (if any, often selected by argument
`model = TRUE`

) or pass the call used when fitting on to the
default method. The default method itself can cope with rather
standard model objects such as those of class
`"lqs"`

from package MASS if no other
arguments are supplied.

The rest of this section applies only to the default method.

If either `formula`

or `data`

is already a model frame (a
data frame with a `"terms"`

attribute) and the other is missing,
the model frame is returned. Unless `formula`

is a terms object,
`as.formula`

and then `terms`

is called on it. (If you wish
to use the `keep.order`

argument of `terms.formula`

, pass a
terms object rather than a formula.)

Row names for the model frame are taken from the `data`

argument
if present, then from the names of the response in the formula (or
rownames if it is a matrix), if there is one.

All the variables in `formula`

, `subset`

and in `…`

are looked for first in `data`

and then in the environment of
`formula`

(see the help for `formula()`

for further
details) and collected into a data frame. Then the `subset`

expression is evaluated, and it is used as a row index to the data
frame. Then the `na.action`

function is applied to the data frame
(and may well add attributes). The levels of any factors in the data
frame are adjusted according to the `drop.unused.levels`

and
`xlev`

arguments: if `xlev`

specifies a factor and a
character variable is found, it is converted to a factor (as from R
2.10.0).

Unless `na.action = NULL`

, time-series attributes will be removed
from the variables found (since they will be wrong if `NA`

s are
removed).

Note that *all* the variables in the formula are included in the
data frame, even those preceded by `-`

.

Only variables whose type is raw, logical, integer, real, complex or character can be included in a model frame: this includes classed variables such as factors (whose underlying type is integer), but excludes lists.

`get_all_vars`

returns a `data.frame`

containing the
variables used in `formula`

plus those specified in `…`

which are recycled to the number of data frame rows.
Unlike `model.frame.default`

, it returns the input variables and
not those resulting from function calls in `formula`

.

Chambers, J. M. (1992)
*Data for models.*
Chapter 3 of *Statistical Models in S*
eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.

`model.matrix`

for the ‘design matrix’,
`formula`

for formulas and
`expand.model.frame`

for model.frame manipulation.

```
# NOT RUN {
data.class(model.frame(dist ~ speed, data = cars))
## get_all_vars(): new var.s are recycled (iff length matches: 50 = 2*25)
ncars <- get_all_vars(sqrt(dist) ~ I(speed/2), data = cars, newVar = 2:3)
stopifnot(is.data.frame(ncars),
identical(cars, ncars[,names(cars)]),
ncol(ncars) == ncol(cars) + 1)
# }
```

Run the code above in your browser using DataCamp Workspace