
Tidy summarizes information about the components of a model. A model component might be a single term in a regression, a single hypothesis, a cluster, or a class. Exactly what tidy considers to be a model component varies cross models but is usually self-evident. If a model has several distinct types of components, you will need to specify which components to return.
# S3 method for lm
tidy(
x,
conf.int = FALSE,
conf.level = 0.95,
exponentiate = FALSE,
quick = FALSE,
...
)# S3 method for summary.lm
tidy(x, ...)
An lm
object created by stats::lm()
.
Logical indicating whether or not to include a confidence
interval in the tidied output. Defaults to FALSE
.
The confidence level to use for the confidence interval
if conf.int = TRUE
. Must be strictly greater than 0 and less than 1.
Defaults to 0.95, which corresponds to a 95 percent confidence interval.
Logical indicating whether or not to exponentiate the
the coefficient estimates. This is typical for logistic and multinomial
regressions, but a bad idea if there is no log or logit link. Defaults
to FALSE
.
Logical indiciating if the only the term
and estimate
columns should be returned. Often useful to avoid time consuming
covariance and standard error calculations. Defaults to FALSE
.
Additional arguments. Not used. Needed to match generic
signature only. Cautionary note: Misspelled arguments will be
absorbed in ...
, where they will be ignored. If the misspelled
argument has a default value, the default value will be used.
For example, if you pass conf.lvel = 0.9
, all computation will
proceed using conf.level = 0.95
. Additionally, if you pass
newdata = my_tibble
to an augment()
method that does not
accept a newdata
argument, it will use the default value for
the data
argument.
A tibble::tibble()
with one row for each term in the
regression. The tibble has columns:
The name of the regression term.
The estimated value of the regression term.
The standard error of the regression term.
The value of a statistic, almost always a T-statistic, to use in a hypothesis that the regression term is non-zero.
The two-sided p-value associated with the observed statistic.
The low end of a confidence interval for the regression
term. Included only if conf.int = TRUE
.
The high end of a confidence interval for the regression
term. Included only if conf.int = TRUE
.
If the linear model is an mlm object (multiple linear model), there is an additional column:
Which response column the coefficients correspond to (typically Y1, Y2, etc)
If you have missing values in your model data, you may need to refit
the model with na.action = na.exclude
.
Other lm tidiers:
augment.glm()
,
augment.lm()
,
glance.glm()
,
glance.lm()
,
tidy.glm()
# NOT RUN {
library(ggplot2)
library(dplyr)
mod <- lm(mpg ~ wt + qsec, data = mtcars)
tidy(mod)
glance(mod)
# coefficient plot
d <- tidy(mod) %>%
mutate(
low = estimate - std.error,
high = estimate + std.error
)
ggplot(d, aes(estimate, term, xmin = low, xmax = high, height = 0)) +
geom_point() +
geom_vline(xintercept = 0) +
geom_errorbarh()
augment(mod)
augment(mod, mtcars)
# predict on new data
newdata <- mtcars %>% head(6) %>% mutate(wt = wt + 1)
augment(mod, newdata = newdata)
au <- augment(mod, data = mtcars)
ggplot(au, aes(.hat, .std.resid)) +
geom_vline(size = 2, colour = "white", xintercept = 0) +
geom_hline(size = 2, colour = "white", yintercept = 0) +
geom_point() + geom_smooth(se = FALSE)
plot(mod, which = 6)
ggplot(au, aes(.hat, .cooksd)) +
geom_vline(xintercept = 0, colour = NA) +
geom_abline(slope = seq(0, 3, by = 0.5), colour = "white") +
geom_smooth(se = FALSE) +
geom_point()
# column-wise models
a <- matrix(rnorm(20), nrow = 10)
b <- a + rnorm(length(a))
result <- lm(b ~ a)
tidy(result)
# }
Run the code above in your browser using DataLab