
Last chance! 50% off unlimited learning
Sale ends in
geom_smooth
and stat_smooth
are effectively aliases: they
both use the same arguments. Use geom_smooth
unless you want to
display the results with a non-standard geom.
geom_smooth(mapping = NULL, data = NULL, stat = "smooth", position = "identity", ..., method = "auto", formula = y ~ x, se = TRUE, na.rm = FALSE, show.legend = NA, inherit.aes = TRUE)
stat_smooth(mapping = NULL, data = NULL, geom = "smooth", position = "identity", ..., method = "auto", formula = y ~ x, se = TRUE, n = 80, span = 0.75, fullrange = FALSE, level = 0.95, method.args = list(), na.rm = FALSE, show.legend = NA, inherit.aes = TRUE)
If NULL
, the default, the data is inherited from the plot
data as specified in the call to ggplot
.
A data.frame
, or other object, will override the plot
data. All objects will be fortified to produce a data frame. See
fortify
for which variables will be created.
A function
will be called with a single argument,
the plot data. The return value must be a data.frame.
, and
will be used as the layer data.
layer
. These are
often aesthetics, used to set an aesthetic to a fixed value, like
color = "red"
or size = 3
. They may also be parameters
to the paired geom/stat. For method = "auto"
the smoothing method is chosen based on the
size of the largest group (across all panels). loess
is
used for than 1,000 observations; otherwise gam
is
used with formula = y ~ s(x, bs = "cs")
. Somewhat anecdotally,
loess
gives a better appearance, but is O(n^2) in memory, so does
not work for larger datasets.
y ~ x
,
y ~ poly(x, 2)
, y ~ log(x)
FALSE
, the default, missing values are removed with
a warning. If TRUE
, missing values are silently removed.NA
, the default, includes if any aesthetics are mapped.
FALSE
never includes, and TRUE
always includes.FALSE
, overrides the default aesthetics,
rather than combining with them. This is most useful for helper functions
that define both data and aesthetics and shouldn't inherit behaviour from
the default plot specification, e.g. borders
.geom_smooth
and stat_smooth
.method
.predictdf
generic and its methods. For most methods the standard
error bounds are computed using the predict
method - the
exceptions are loess
which uses a t-based approximation, and
glm
where the normal confidence interval is constructed on the link
scale, and then back-transformed to the response scale.
lm
for linear smooths,
glm
for generalised linear smooths,
loess
for local smooths
ggplot(mpg, aes(displ, hwy)) +
geom_point() +
geom_smooth()
# Use span to control the "wiggliness" of the default loess smoother
# The span is the fraction of points used to fit each local regression:
# small numbers make a wigglier curve, larger numbers make a smoother curve.
ggplot(mpg, aes(displ, hwy)) +
geom_point() +
geom_smooth(span = 0.3)
# Instead of a loess smooth, you can use any other modelling function:
ggplot(mpg, aes(displ, hwy)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE)
ggplot(mpg, aes(displ, hwy)) +
geom_point() +
geom_smooth(method = "lm", formula = y ~ splines::bs(x, 3), se = FALSE)
# Smoothes are automatically fit to each group (defined by categorical
# aesthetics or the group aesthetic) and for each facet
ggplot(mpg, aes(displ, hwy, colour = class)) +
geom_point() +
geom_smooth(se = FALSE, method = "lm")
ggplot(mpg, aes(displ, hwy)) +
geom_point() +
geom_smooth(span = 0.8) +
facet_wrap(~drv)
binomial_smooth <- function(...) {
geom_smooth(method = "glm", method.args = list(family = "binomial"), ...)
}
# To fit a logistic regression, you need to coerce the values to
# a numeric vector lying between 0 and 1.
ggplot(rpart::kyphosis, aes(Age, Kyphosis)) +
geom_jitter(height = 0.05) +
binomial_smooth()
ggplot(rpart::kyphosis, aes(Age, as.numeric(Kyphosis) - 1)) +
geom_jitter(height = 0.05) +
binomial_smooth()
ggplot(rpart::kyphosis, aes(Age, as.numeric(Kyphosis) - 1)) +
geom_jitter(height = 0.05) +
binomial_smooth(formula = y ~ splines::ns(x, 2))
# But in this case, it's probably better to fit the model yourself
# so you can exercise more control and see whether or not it's a good model
Run the code above in your browser using DataLab