# lm

##### Fitting Linear Models

`lm`

is used to fit linear models.
It can be used to carry out regression,
single stratum analysis of variance and
analysis of covariance (although `aov`

may provide a more
convenient interface for these).

- Keywords
- regression

##### Usage

```
lm(formula, data, subset, weights, na.action,
method = "qr", model = TRUE, x = FALSE, y = FALSE, qr = TRUE,
singular.ok = TRUE, contrasts = NULL, offset, …)
```

##### Arguments

- formula
an object of class

`"formula"`

(or one that can be coerced to that class): a symbolic description of the model to be fitted. The details of model specification are given under ‘Details’.- data
an optional data frame, list or environment (or object coercible by

`as.data.frame`

to a data frame) containing the variables in the model. If not found in`data`

, the variables are taken from`environment(formula)`

, typically the environment from which`lm`

is called.- subset
an optional vector specifying a subset of observations to be used in the fitting process.

- weights
an optional vector of weights to be used in the fitting process. Should be

`NULL`

or a numeric vector. If non-NULL, weighted least squares is used with weights`weights`

(that is, minimizing`sum(w*e^2)`

); otherwise ordinary least squares is used. See also ‘Details’,- na.action
a function which indicates what should happen when the data contain

`NA`

s. The default is set by the`na.action`

setting of`options`

, and is`na.fail`

if that is unset. The ‘factory-fresh’ default is`na.omit`

. Another possible value is`NULL`

, no action. Value`na.exclude`

can be useful.- method
the method to be used; for fitting, currently only

`method = "qr"`

is supported;`method = "model.frame"`

returns the model frame (the same as with`model = TRUE`

, see below).- model, x, y, qr
logicals. If

`TRUE`

the corresponding components of the fit (the model frame, the model matrix, the response, the QR decomposition) are returned.- singular.ok
logical. If

`FALSE`

(the default in S but not in R) a singular fit is an error.- contrasts
an optional list. See the

`contrasts.arg`

of`model.matrix.default`

.- offset
this can be used to specify an

*a priori*known component to be included in the linear predictor during fitting. This should be`NULL`

or a numeric vector of length equal to the number of cases. One or more`offset`

terms can be included in the formula instead or as well, and if more than one are specified their sum is used. See`model.offset`

.- …
additional arguments to be passed to the low level regression fitting functions (see below).

##### Details

Models for `lm`

are specified symbolically. A typical model has
the form `response ~ terms`

where `response`

is the (numeric)
response vector and `terms`

is a series of terms which specifies a
linear predictor for `response`

. A terms specification of the form
`first + second`

indicates all the terms in `first`

together
with all the terms in `second`

with duplicates removed. A
specification of the form `first:second`

indicates the set of
terms obtained by taking the interactions of all terms in `first`

with all terms in `second`

. The specification `first*second`

indicates the *cross* of `first`

and `second`

. This is
the same as `first + second + first:second`

.

If the formula includes an `offset`

, this is evaluated and
subtracted from the response.

If `response`

is a matrix a linear model is fitted separately by
least-squares to each column of the matrix.

See `model.matrix`

for some further details. The terms in
the formula will be re-ordered so that main effects come first,
followed by the interactions, all second-order, all third-order and so
on: to avoid this pass a `terms`

object as the formula (see
`aov`

and `demo(glm.vr)`

for an example).

A formula has an implied intercept term. To remove this use either
`y ~ x - 1`

or `y ~ 0 + x`

. See `formula`

for
more details of allowed formulae.

Non-`NULL`

`weights`

can be used to indicate that
different observations have different variances (with the values in
`weights`

being inversely proportional to the variances); or
equivalently, when the elements of `weights`

are positive
integers \(w_i\), that each response \(y_i\) is the mean of
\(w_i\) unit-weight observations (including the case that there
are \(w_i\) observations equal to \(y_i\) and the data have been
summarized). However, in the latter case, notice that within-group
variation is not used. Therefore, the sigma estimate and residual
degrees of freedom may be suboptimal; in the case of replication
weights, even wrong. Hence, standard errors and analysis of variance
tables should be treated with care.

`lm`

calls the lower level functions `lm.fit`

, etc,
see below, for the actual numerical computations. For programming
only, you may consider doing likewise.

All of `weights`

, `subset`

and `offset`

are evaluated
in the same way as variables in `formula`

, that is first in
`data`

and then in the environment of `formula`

.

##### Value

`lm`

returns an object of `class`

`"lm"`

or for
multiple responses of class `c("mlm", "lm")`

.

The functions `summary`

and `anova`

are used to
obtain and print a summary and analysis of variance table of the
results. The generic accessor functions `coefficients`

,
`effects`

, `fitted.values`

and `residuals`

extract
various useful features of the value returned by `lm`

.

An object of class `"lm"`

is a list containing at least the
following components:

a named vector of coefficients

the residuals, that is response minus fitted values.

the fitted mean values.

the numeric rank of the fitted linear model.

(only for weighted fits) the specified weights.

the residual degrees of freedom.

the matched call.

the `terms`

object used.

(only where relevant) the contrasts used.

(only where relevant) a record of the levels of the factors used in fitting.

the offset used (missing if none were used).

if requested, the response used.

if requested, the model matrix used.

if requested (the default), the model frame used.

(where relevant) information returned by
`model.frame`

on the special handling of `NA`

s.

In addition, non-null fits will have components assign, effects and (unless not requested) qr relating to the linear fit, for use by extractor functions such as summary and effects.

##### Note

Offsets specified by `offset`

will not be included in predictions
by `predict.lm`

, whereas those specified by an offset term
in the formula will be.

##### Using time series

Considerable care is needed when using `lm`

with time series.

Unless `na.action = NULL`

, the time series attributes are
stripped from the variables before the regression is done. (This is
necessary as omitting `NA`

s would invalidate the time series
attributes, and if `NA`

s are omitted in the middle of the series
the result would no longer be a regular time series.)

Even if the time series attributes are retained, they are not used to
line up series, so that the time shift of a lagged or differenced
regressor would be ignored. It is good practice to prepare a
`data`

argument by `ts.intersect(…, dframe = TRUE)`

,
then apply a suitable `na.action`

to that data frame and call
`lm`

with `na.action = NULL`

so that residuals and fitted
values are time series.

##### References

Chambers, J. M. (1992)
*Linear models.*
Chapter 4 of *Statistical Models in S*
eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.

Wilkinson, G. N. and Rogers, C. E. (1973).
Symbolic descriptions of factorial models for analysis of variance.
*Applied Statistics*, **22**, 392--399.
10.2307/2346786.

##### See Also

`summary.lm`

for summaries and `anova.lm`

for
the ANOVA table; `aov`

for a different interface.

The generic functions `coef`

, `effects`

,
`residuals`

, `fitted`

, `vcov`

.

`predict.lm`

(via `predict`

) for prediction,
including confidence and prediction intervals;
`confint`

for confidence intervals of *parameters*.

`lm.influence`

for regression diagnostics, and
`glm`

for **generalized** linear models.

The underlying low level functions,
`lm.fit`

for plain, and `lm.wfit`

for weighted
regression fitting.

More `lm()`

examples are available e.g., in
`anscombe`

, `attitude`

, `freeny`

,
`LifeCycleSavings`

, `longley`

,
`stackloss`

, `swiss`

.

`biglm`

in package biglm for an alternative
way to fit linear models to large datasets (especially those with many
cases).

##### Examples

`library(stats)`

```
# NOT RUN {
require(graphics)
## Annette Dobson (1990) "An Introduction to Generalized Linear Models".
## Page 9: Plant Weight Data.
ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
group <- gl(2, 10, 20, labels = c("Ctl","Trt"))
weight <- c(ctl, trt)
lm.D9 <- lm(weight ~ group)
lm.D90 <- lm(weight ~ group - 1) # omitting intercept
# }
# NOT RUN {
anova(lm.D9)
summary(lm.D90)
# }
# NOT RUN {
opar <- par(mfrow = c(2,2), oma = c(0, 0, 1.1, 0))
plot(lm.D9, las = 1) # Residuals, Fitted, ...
par(opar)
# }
# NOT RUN {
### less simple examples in "See Also" above
# }
```

*Documentation reproduced from package stats, version 3.5.3, License: Part of R 3.5.3*

### Community examples

**richie@datacamp.com**at Jan 17, 2017 stats v3.3.1

`lm()` takes a formula and a data frame. See [`formula()`](https://www.rdocumentation.org/packages/stats/topics/formula) for how to contruct the first argument. ```{r} (model_with_intercept <- lm(weight ~ group, PlantGrowth)) (model_without_intercept <- lm(weight ~ group - 1, PlantGrowth)) ``` You get more information about the model using [`summary()`](https://www.rdocumentation.org/packages/stats/topics/summary.lm) ```{r} (model_without_intercept <- lm(weight ~ group - 1, PlantGrowth)) summary(model_without_intercept) ``` Diagnostic plots are available; see [`plot.lm()`](https://www.rdocumentation.org/packages/stats/topics/plot.lm) for more examples. ```{r} (model_without_intercept <- lm(weight ~ group - 1, PlantGrowth)) layout(matrix(1:6, nrow = 2)) plot(model_without_intercept, which = 1:6) ``` You can predict new values; see [`predict()`](https://www.rdocumentation.org/packages/stats/topics/predict) and [`predict.lm()`](https://www.rdocumentation.org/packages/stats/topics/predict.lm) . ```{r} (model_without_intercept <- lm(weight ~ group - 1, PlantGrowth)) predictions <- data.frame(group = levels(PlantGrowth$group)) predictions$weight <- predict(model_without_intercept, predictions) predictions # Plot predictions against the data boxplot(weight ~ group, PlantGrowth, ylab = "weight") points(weight ~ group, predictions, col = "red") ``` There are many methods available for inspecting `lm` objects. ```{r} (model_without_intercept <- lm(weight ~ group - 1, PlantGrowth)) confint(model_without_intercept) anova(model_without_intercept) residuals(model_without_intercept) fitted(model_without_intercept) influence(model_without_intercept) methods(class = "lm") ```