# formula

##### Model Formulae

The generic function `formula`

and its specific methods provide a
way of extracting formulae which have been included in other objects. `as.formula`

is almost identical, additionally preserving
attributes when `object`

already inherits from
`"formula"`

.

- Keywords
- models

##### Usage

```
formula(x, …)
as.formula(object, env = parent.frame())
```# S3 method for formula
print(x, showEnv = !identical(e, .GlobalEnv), …)

##### Arguments

- x, object
- R object.
- …
- further arguments passed to or from other methods.
- env
- the environment to associate with the result, if not already a formula.
- showEnv
- logical indicating if the environment should be printed as well.

##### Details

The models fit by, e.g., the `lm`

and `glm`

functions
are specified in a compact symbolic form.
The `~`

operator is basic in the formation of such models.
An expression of the form `y ~ model`

is interpreted
as a specification that the response `y`

is modelled
by a linear predictor specified symbolically by `model`

.
Such a model consists of a series of terms separated
by `+`

operators.
The terms themselves consist of variable and factor
names separated by `:`

operators.
Such a term is interpreted as the interaction of
all the variables and factors appearing in the term. In addition to `+`

and `:`

, a number of other operators are
useful in model formulae. The `*`

operator denotes factor
crossing: `a*b`

interpreted as `a+b+a:b`

. The `^`

operator indicates crossing to the specified degree. For example
`(a+b+c)^2`

is identical to `(a+b+c)*(a+b+c)`

which in turn
expands to a formula containing the main effects for `a`

,
`b`

and `c`

together with their second-order interactions.
The `%in%`

operator indicates that the terms on its left are
nested within those on the right. For example `a + b %in% a`

expands to the formula `a + a:b`

. The `-`

operator removes
the specified terms, so that `(a+b+c)^2 - a:b`

is identical to
`a + b + c + b:c + a:c`

. It can also used to remove the
intercept term: when fitting a linear model `y ~ x - 1`

specifies
a line through the origin. A model with no intercept can be also
specified as `y ~ x + 0`

or `y ~ 0 + x`

. While formulae usually involve just variable and factor
names, they can also involve arithmetic expressions.
The formula `log(y) ~ a + log(x)`

is quite legal.
When such arithmetic expressions involve
operators which are also used symbolically
in model formulae, there can be confusion between
arithmetic and symbolic operator use. To avoid this confusion, the function `I()`

can be used to bracket those portions of a model
formula where the operators are used in their
arithmetic sense. For example, in the formula
`y ~ a + I(b+c)`

, the term `b+c`

is to be
interpreted as the sum of `b`

and `c`

. Variable names can be quoted by backticks ``like this``

in
formulae, although there is no guarantee that all code using formulae
will accept such non-syntactic names. Most model-fitting functions accept formulae with right-hand-side
including the function `offset`

to indicate terms with a
fixed coefficient of one. Some functions accept other
‘specials’ such as `strata`

or `cluster`

(see the
`specials`

argument of `terms.formula)`

. There are two special interpretations of `.`

in a formula. The
usual one is in the context of a `data`

argument of model
fitting functions and means ‘all columns not otherwise in the
formula’: see `terms.formula`

. In the context of
`update.formula`

, **only**, it means ‘what was
previously in this part of the formula’. When `formula`

is called on a fitted model object, either a
specific method is used (such as that for class `"nls"`

) or the
default method. The default first looks for a `"formula"`

component of the object (and evaluates it), then a `"terms"`

component, then a `formula`

parameter of the call (and evaluates
its value) and finally a `"formula"`

attribute. There is a `formula`

method for data frames. If there is only
one column this forms the RHS with an empty LHS. For more columns,
the first column is the LHS of the formula and the remaining columns
separated by `+`

form the RHS.

##### Value

All the functions above produce an object of class `"formula"`

which contains a symbolic model formula.

##### Environments

A formula object has an associated environment, and
this environment (rather than the parent
environment) is used by `model.frame`

to evaluate variables
that are not found in the supplied `data`

argument. Formulas created with the `~`

operator use the
environment in which they were created. Formulas created with
`as.formula`

will use the `env`

argument for their
environment.

##### References

Chambers, J. M. and Hastie, T. J. (1992)
*Statistical models.*
Chapter 2 of *Statistical Models in S*
eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.

##### See Also

`I`

, `offset`

. For formula manipulation: `terms`

, and `all.vars`

;
for typical use: `lm`

, `glm`

, and
`coplot`

.

##### Examples

`library(stats)`

```
class(fo <- y ~ x1*x2) # "formula"
fo
typeof(fo) # R internal : "language"
terms(fo)
environment(fo)
environment(as.formula("y ~ x"))
environment(as.formula("y ~ x", env = new.env()))
## Create a formula for a model with a large number of variables:
xnam <- paste0("x", 1:25)
(fmla <- as.formula(paste("y ~ ", paste(xnam, collapse= "+"))))
```

*Documentation reproduced from package stats, version 3.3.3, License: Part of R 3.3.3*

### Community examples

**dibyadeep.paul**at Sep 25, 2017 stats v3.4.1

``` #example of usage of y ~ ., to explain the statement: #"The usual one is in the context of a data argument of model #fitting functions and means ‘all columns not otherwise in the formula’: see terms.formula." #create mydata mydata<-data.frame(matrix(c( 1,5501,8.1,9552,1923, 2,5945,7.0,9680,1961, 3,6629,7.3,9731,1979, 4,7556,7.5,11666,2030, 5,8716,7.0,14675,2112, 6,9369,6.4,15265,2192, 7,9920,6.5,15484,2235, 8,10167,6.4,15723,2351, 9,11084,6.3,16501,2411, 10,12504,7.7,16890,2475) , nrow = 10, ncol = 5, byrow=TRUE)) colnames(mydata) <- c("gene","cna","common","PC1","PC2") #generate linear regression fit for mydata. gene ~ . is equivalent to gene ~ cna+common+PC1+PC2, ie. #the righthand side of ~ is a function of all the variables other than gene (which is on the left hand side). mymodel=lm(gene ~ .,data=mydata) #generates the fit ```

**richie@datacamp.com**at Jan 17, 2017 stats v3.3.1

`y` is the response; `x1`, `x2` and `x3` are independent variables; terms with colons are the interactions between those variables. ```{r} y ~ x1 + x2 + x3 + x1:x2 + x1:x3 + x2:x3 + x1:x2:x3 ``` A more compact form of the above. ```{r} y ~ x1 * x2 * x3 ``` You can specify interactions up to a certain level using the power operator. ```{r} y ~ (x1 + x2 + x3) ^ 2 # same as y ~ x1 + x2 + x3 + x1:x2 + x1:x3 + x2:x3 ``` Minus removes terms from the formula. ```{r} y ~ x1 * x2 * x3 - x1:x2:x3 # same as the previous formula ``` To include powers of variables, use the [`I()`](https://www.rdocumentation.org/packages/base/topics/AsIs) function. ```{r} y ~ I(x1 ^ 2) ``` Other functions can be included as is. ```{r} log(y) ~ log(x1) ``` Some functions allow formulae with no left-hand side. ```{r} ~ x1 * x2 * x3 ``` Modelling functions use the syntax plus zero to specify a model with no intercept. ```{r} y ~ x1 + 0 ``` You can also use minus one to specify a model with no intercept. ```{r} y ~ x1 - 1 # same as the previous formula ``` Some functions accept grouping formulae using pipes. ```{r} y ~ x1 + x2 | x3 ``` Groups can sometimes also be nested using forward slashes. ```{r} y ~ x1 + x2 | x3 / x4 ``` `%in%` is rarely used, and works like a colon ```{r} y ~ x1 %in% x2 # same as y ~ x1:x2 ``` Sometimes it is convenient to use [`paste()`](https://www.rdocumentation.org/packages/base/topics/paste) to construct the formula as a string, then use `as.formula()`. ```{r} x_names <- paste0("x", 1:25) as.formula(paste("y ~ ", paste(x_names, collapse= " + "))) ``` Non-standard variable names can be included by using backticks, though functions are not guaranteed to be able correctly interpret them ```{r} y ~ `x 1` ``` Formulae (along with expressions, calls and names) are language objects. ```{r} is.language(y ~ x) ``` formulae have an associated environment. This tells functions like [`lm()`](https://www.rdocumentation.org/packages/stats/topics/lm) where to look for variables that aren't included in the data argument. ```{r} environment(y ~ x) ``` In advanced usage, you can specify the associated environment. ```{r} environment(as.formula("y ~ x")) environment(as.formula("y ~ x", env = new.env())) ```