stat_poly_line: Predicted line from linear model fit

Description

stat_poly_line() fits a polynomial, by default with stats::lm(), but alternatively using robust regression or generalized least squares. Predicted values and a confidence band, if possible, are computed and, by default, plotted.

Usage

stat_poly_line(
  mapping = NULL,
  data = NULL,
  geom = "smooth",
  position = "identity",
  ...,
  method = "lm",
  formula = NULL,
  se = NULL,
  fit.seed = NA,
  fm.values = FALSE,
  n = 80,
  fullrange = FALSE,
  level = 0.95,
  method.args = list(),
  n.min = 2L,
  na.rm = FALSE,
  orientation = NA,
  show.legend = NA,
  inherit.aes = TRUE
)

Value

The value returned by the statistic is a data frame, with n

rows of predicted values and their confidence limits. Optionally it will also include additional values related to the model fit. When a

predict() method is not available for the fitted model class, the value returned by calling fitted() is returned instead, with a message.

Arguments

mapping: The aesthetic mapping, usually constructed with aes. Only needs to be set at the layer level if you are overriding the plot defaults.
data: A layer specific dataset, only needed if you want to override the plot defaults.
geom: The geometric object to use display the data
position: The position adjustment to use for overlapping points on this layer.
...: other arguments passed on to layer. This can include aesthetics whose values you want to set, not map. See layer for more details.
method: function or character If character, "lm", "rlm", "lqs". "gls" "ma", "sma", or the name of a model fit function are accepted, possibly followed by the fit function's method argument separated by a colon (e.g. "rlm:M"). If a function is different to lm(), rlm(), lqs(), gls(), ma, sma, it must have formal parameters named formula, data, weights, and method. See Details.
formula: a formula object. Using aesthetic names x and y instead of original variable names.
se: Display confidence interval around smooth? (`TRUE` by default only for fits with lm() and rlm(), see `level` to control.)
fit.seed: RNG seed argument passed to set.seed(). Defaults to NA, indicating that set.seed() should not be called.
fm.values: logical Add metadata and parameter estimates extracted from the fitted model object; FALSE by default.
n: Number of points at which to predict with the fitted model.
fullrange: Should the fit span the full range of the plot, or just the range of the data group used in each fit?
level: Level of confidence interval to use (0.95 by default).
method.args: named list with additional arguments. Not data or weights which are always passed through aesthetic mappings.
n.min: integer Minimum number of distinct values in the explanatory variable (on the rhs of formula) for fitting to the attempted.
na.rm: a logical indicating whether NA values should be stripped before the computation proceeds.
orientation: character Either "x" or "y" controlling the default for formula. The letter indicates the aesthetic considered the explanatory variable in the model fit.
show.legend: logical. Should this layer be included in the legends? NA, the default, includes if any aesthetics are mapped. FALSE never includes, and TRUE always includes.
inherit.aes: If FALSE, overrides the default aesthetics, rather than combining with them. This is most useful for helper functions that define both data and aesthetics and shouldn't inherit behaviour from the default plot specification, e.g. borders.

Model fit methods supported

Several model fit functions are supported explicitly, and some of their differences smoothed out. The compatibility is checked late, based on the class of the returned fitted model object. This makes it possible to use wrapper functions that do model selection or other adjustments to the fit procedure on a per panel or per group basis. In the case of fitted model objects of classes not explicitly supported an attempt is made to find the usual accessors, and if found, either complete or partial support frequently just works. The argument to parameter method can be either the name of a function or a character string giving the name. This approach makes it possible to support model fit functions that are not dependencies of 'ggpmisc'. Either attach the package where the function is defined and pass it by name or as string, or use double colon notation when passing the name of the function.

Computed variables

`stat_poly_line()` provides the following variables, some of which depend on the orientation:

y or x: predicted value
ymin or xmin: lower pointwise confidence interval around the mean
ymax or xmax: upper pointwise confidence interval around the mean
se: standard error

If fm.values = TRUE is passed then columns based on the summary of the model fit are added, with the same value in each row within a group. This is wasteful and disabled by default, but provides a simple and robust approach to achieve effects like colouring or hiding of the model fit line based on P-values, r-squared, adjusted r-squared or the number of observations.

Aesthetics

stat_poly_line understands x and y, to be referenced in the formula and weight passed as argument to parameter weights. All three must be mapped to numeric variables. In addition, the aesthetics understood by the geom ("geom_smooth" is the default) are understood and grouping respected.

Details

This statistic is similar to stat_smooth but has different defaults and supports additional model fit functions. It also interprets the argument passed to formula differently than stat_smooth(), accepting y as explanatory variable and setting orientation automatically. The default for method is "lm" and spline-based smoothers like loess are not supported. Other defaults are consistent with those in stat_poly_eq(), stat_quant_line(), stat_quant_band(), stat_quant_eq(), stat_ma_line(), stat_ma_eq(). As some model fitting functions can depend on the RNG, fit.seed if different to NA is used as argument in a call to set.seed() immediately ahead of model fitting.

geom_poly_line() treats the x and y aesthetics differently and can thus have two orientations. The orientation can be deduced from the argument passed to formula. Thus, stat_poly_line() will by default guess which orientation the layer should have. If no argument is passed to formula, the formula defaults to y ~ x. For consistency with stat_smooth orientation can be also specified directly passing an argument to the orientation parameter, which can be either "x" or "y". The value of orientation gives the axis that is taken as the explanatory variable or x in the model formula. Package 'ggpmisc' does not define new geometries matching the new statistics as they are not needed and conceptually transformations of data are statistics in the grammar of graphics.

A ggplot statistic receives as data a data frame that is not the one passed as argument by the user, but instead a data frame with the variables mapped to aesthetics. stat_poly_eq() mimics how stat_smooth() works, except that only polynomials can be fitted. Similarly to these statistics the model fits respect grouping, so the scales used for x and y should both be continuous scales rather than discrete.

With method "lm", singularity results in terms being dropped with a message if more numerous than can be fitted with a singular (exact) fit. In this case and if the model results in a perfect fit due to low number of observation, estimates for various parameters are NaN or NA.

With methods other than "lm", the model fit functions simply fail in case of singularity, e.g., singular fits are not implemented in "rlm".

In both cases the minimum number of observations with distinct values in the explanatory variable can be set through parameter n.min. The default n.min = 2L is the smallest suitable for method "lm" but too small for method "rlm" for which n.min = 3L is needed. Anyway, model fits with very few observations are of little interest and using larger values of n.min than the default is wise.

Examples

Run this code

ggplot(mpg, aes(displ, hwy)) +
  geom_point() +
  stat_poly_line()

ggplot(mpg, aes(displ, hwy)) +
  geom_point() +
  stat_poly_line(formula = x ~ y)

ggplot(mpg, aes(displ, hwy)) +
  geom_point() +
  stat_poly_line(formula = y ~ poly(x, 3))

ggplot(mpg, aes(displ, hwy)) +
  geom_point() +
  stat_poly_line(formula = x ~ poly(y, 3))

# Smooths are automatically fit to each group (defined by categorical
# aesthetics or the group aesthetic) and for each facet.

ggplot(mpg, aes(displ, hwy, colour = class)) +
  geom_point() +
  stat_poly_line(se = FALSE)

ggplot(mpg, aes(displ, hwy)) +
  geom_point() +
  stat_poly_line() +
  facet_wrap(~drv)

# Inspecting the returned data using geom_debug()
gginnards.installed <- requireNamespace("gginnards", quietly = TRUE)

if (gginnards.installed)
  library(gginnards)

if (gginnards.installed)
  ggplot(mpg, aes(displ, hwy)) +
    stat_poly_line(geom = "debug")

if (gginnards.installed)
  ggplot(mpg, aes(displ, hwy)) +
    stat_poly_line(geom = "debug", fm.values = TRUE)

if (gginnards.installed)
  ggplot(mpg, aes(displ, hwy)) +
    stat_poly_line(geom = "debug", method = lm, fm.values = TRUE)

Run the code above in your browser using DataLab