Tidying methods for a linear model

These methods tidy the coefficients of a linear model into a summary, augment the original data with information on the fitted values and residuals, and construct a one-row glance of the model's statistics.

## S3 method for class 'lm':
tidy(x, conf.int = FALSE, conf.level = 0.95,
  exponentiate = FALSE, ...)

## S3 method for class 'lm': augment(x, data = x$model, newdata, type.predict, type.residuals, ...)

## S3 method for class 'lm': glance(x, ...)

lm object
whether to include a confidence interval
confidence level of the interval, used only if conf.int=TRUE
whether to exponentiate the coefficient estimates and confidence intervals (typical for logistic regression)
extra arguments (not used)
Original data, defaults to the extracting it from the model
If provided, performs predictions on the new data
Type of prediction to compute for a GLM; passed on to predict.glm
Type of residuals to compute for a GLM; passed on to residuals.glm

If you have missing values in your model data, you may need to refit the model with na.action = na.exclude.

If conf.int=TRUE, the confidence interval is computed with the confint function.

When the modeling was performed with na.action = "na.omit" (as is the typical default), rows with NA in the initial data are omitted entirely from the augmented data frame. When the modeling was performed with na.action = "na.exclude", one should provide the original data as a second argument, at which point the augmented data will contain those rows (typically with NAs in place of the new columns). If the original data is not provided to augment and na.action = "na.exclude", a warning is raised and the incomplete rows are dropped.


  • All tidying methods return a data.frame without rownames. The structure depends on the method chosen.

    tidy.lm returns one row for each coefficient, with five columns:

  • termThe term in the linear model being estimated and tested
  • estimateThe estimated coefficient
  • std.errorThe standard error from the linear model
  • statistict-statistic
  • p.valuetwo-sided p-value
  • If cont.int=TRUE, it also includes columns for conf.low and conf.high, computed with confint.

    When newdata is not supplied augment.lm returns one row for each observation, with seven columns added to the original data:

  • .hatDiagonal of the hat matrix
  • .sigmaEstimate of residual standard deviation when corresponding observation is dropped from model
  • .cooksdCooks distance, cooks.distance
  • .fittedFitted values of model
  • .se.fitStandard errors of fitted values
  • .residResiduals
  • .std.residStandardised residuals
  • When newdata is supplied, augment.lm returns one row for each observation, with three columns added to the new data:
  • .fittedFitted values of model
  • .se.fitStandard errors of fitted values
  • .residResiduals of fitted values on the new data
  • glance.lm returns a one-row data.frame with the columns
  • r.squaredThe percent of variance explained by the model
  • adj.r.squaredr.squared adjusted based on the degrees of freedom
  • sigmaThe square root of the estimated residual variance
  • statisticF-statistic
  • p.valuep-value from the F test, describing whether the full regression is significant
  • dfDegrees of freedom used by the coefficients
  • logLikthe data's log-likelihood under the model
  • AICthe Akaike Information Criterion
  • BICthe Bayesian Information Criterion
  • deviancedeviance
  • df.residualresidual degrees of freedom

See Also



  • augment.lm
  • glance.lm
  • lm_tidiers
  • tidy.lm

mod <- lm(mpg ~ wt + qsec, data = mtcars)


# coefficient plot
d <- tidy(mod) %>% mutate(low = estimate - std.error,
                          high = estimate + std.error)
ggplot(d, aes(estimate, term, xmin = low, xmax = high, height = 0)) +
     geom_point() + geom_vline() + geom_errorbarh()

head(augment(mod, mtcars))

# predict on new data
newdata <- mtcars %>% head(6) %>% mutate(wt = wt + 1)
augment(mod, newdata = newdata)

au <- augment(mod, data = mtcars)

plot(mod, which = 1)
qplot(.fitted, .resid, data = au) +
  geom_hline(yintercept = 0) +
  geom_smooth(se = FALSE)
qplot(.fitted, .std.resid, data = au) +
  geom_hline(yintercept = 0) +
  geom_smooth(se = FALSE)
qplot(.fitted, .std.resid, data = au,
  colour = factor(cyl))
qplot(mpg, .std.resid, data = au, colour = factor(cyl))

plot(mod, which = 2)
qplot(sample =.std.resid, data = au, stat = "qq") +

plot(mod, which = 3)
qplot(.fitted, sqrt(abs(.std.resid)), data = au) + geom_smooth(se = FALSE)

plot(mod, which = 4)
qplot(seq_along(.cooksd), .cooksd, data = au, geom = "bar",

plot(mod, which = 5)
qplot(.hat, .std.resid, data = au) + geom_smooth(se = FALSE)
ggplot(au, aes(.hat, .std.resid)) +
  geom_vline(size = 2, colour = "white", xintercept = 0) +
  geom_hline(size = 2, colour = "white", yintercept = 0) +
  geom_point() + geom_smooth(se = FALSE)

qplot(.hat, .std.resid, data = au, size = .cooksd) +
  geom_smooth(se = FALSE, size = 0.5)

plot(mod, which = 6)
ggplot(au, aes(.hat, .cooksd)) +
  geom_vline(xintercept = 0, colour = NA) +
  geom_abline(slope = seq(0, 3, by = 0.5), colour = "white") +
  geom_smooth(se = FALSE) +
qplot(.hat, .cooksd, size = .cooksd / .hat, data = au) + scale_size_area()
Documentation reproduced from package broom, version 0.3.4, License: MIT + file LICENSE

Community examples

Looks like there are no examples yet.