This function uses basic R graphics to draw a two-dimensional scatterplot, with options to allow for plot enhancements that are often helpful with regression problems. Enhancements include adding marginal boxplots, estimated mean and variance functions using either parametric or nonparametric methods, point identification, jittering, setting characteristics of points and lines like color, size and symbol, marking points and fitting lines conditional on a grouping variable, and other enhancements.
`sp`

is an abbreviation for `scatterplot`

.

`scatterplot(x, ...)`# S3 method for formula
scatterplot(formula, data, subset, xlab, ylab, id=FALSE,
legend=TRUE, ...)

# S3 method for default
scatterplot(x, y, boxplots=if (by.groups) "" else "xy",
regLine=TRUE, legend=TRUE, id=FALSE, ellipse=FALSE, grid=TRUE,
smooth=TRUE,
groups, by.groups=!missing(groups),
xlab=deparse(substitute(x)), ylab=deparse(substitute(y)),
log="", jitter=list(), cex=par("cex"),
col=carPalette()[-1], pch=1:n.groups,
reset.par=TRUE, ...)

sp(x, ...)

x

vector of horizontal coordinates (or first argument of generic function).

y

vector of vertical coordinates.

formula

a model formula, of the form `y ~ x`

or, if plotting by
groups, `y ~ x | z`

, where `z`

evaluates to a factor
or other variable dividing the data into groups. If `x`

is a factor, then parallel boxplots
are produced using the `Boxplot`

function.

data

data frame within which to evaluate the formula.

subset

expression defining a subset of observations.

boxplots

if `"x"`

a marginal boxplot for the horizontal `x`

-axis is drawn below the plot;
if `"y"`

a marginal boxplot for vertical `y`

-axis is drawn to the left of the plot;
if `"xy"`

both marginal boxplots are drawn; set to `""`

or `FALSE`

to
suppress both boxplots.

regLine

controls adding a fitted regression line to the plot. if
`regLine=FALSE`

, no line is drawn. If `TRUE`

, the default, an OLS
line is fit. This argument can also be a list. The default of `TRUE`

is
equivalent to `regLine=list(method=lm, lty=1, lwd=2, col=col)`

, which specifies
using the `lm`

function to estimate the fitted line, to draw a solid line
(`lty=1`

) of width 2 times the nominal width (`lwd=2`

) in the color given by
the first element of the `col`

argument described below.

legend

when the plot is drawn by groups and `legend=TRUE`

, controls placement
and properties of a
legend; if `FALSE`

, the legend is suppressed. Can be a list of
named arguments, as follows: `title`

for the legend; `inset`

, giving space
as a proportion of the axes to offset the legend from the axes; `coords`

specifying the position of the legend in any form acceptable to the
`legend`

function or, if not given, the legend is placed *above*
the plot in the upper margin; `columns`

for the legend, determined automatically
to prefer a horizontal layout if not given explicitly; `cex`

giving the
relative size of the legend symbols and text. `TRUE`

(the default) is equivalent to
`list(title=deparse(substitute(groups)), inset=0.02, cex=1)`

.

id

controls point identification; if `FALSE`

(the default), no points are
identified; can be a list of named arguments to the `showLabels`

function;
`TRUE`

is equivalent to
`list(method="mahal", n=2, cex=1, col=carPalette()[-1], location="lr")`

,
which identifies the 2 points (in each group) with the largest Mahalanobis distances
from the center of the data. See `showLabels`

for a description of the
other arguments. The default behavior of `id`

is not the same in all graphics
functions in car, as the `method`

used depends on the type of plot.

ellipse

controls plotting data-concentration ellipses. If `FALSE`

(the default), no ellipses are plotted. Can be a list of named values giving
`levels`

, a vector of one or more bivariate-normal probability-contour levels at
which to plot the ellipses; `robust`

, a logical value determing whether to use
the `cov.trob`

function in the MASS package to calculate the center
and covariance matrix for the data ellipses; and `fill`

and `fill.alpha`

,
which control whether the ellipse is filled and the transparency of the fill.
`TRUE`

is equivalent to
`list(levels=c(.5, .95), robust=TRUE, fill=TRUE, fill.alpha=0.2)`

.

grid

If TRUE, the default, a light-gray background grid is put on the graph

smooth

specifies a nonparametric estimate of the mean or median
function of the vertical axis variable given the
horizontal axis variable and optionally a nonparametric estimate of the conditional variance. If
`smooth=FALSE`

neither function is drawn. If `smooth=TRUE`

, then both the mean function
and variance funtions are drawn for ungrouped data, and the mean function only is drawn
for grouped
data. The default smoother is `loessLine`

, which uses the
`loess`

function from
the stats package. This smoother is fast and reliable. See the details below
for changing
the smoother, line type, width and color, of the added lines, and adding arguments
for the smoother.

groups

a factor or other variable dividing the data into groups; groups are plotted with different colors, plotting characters, fits, and smooths. Using this argument is equivalent to specifying the grouping variable in the formula.

by.groups

if `TRUE`

(the default when there are groups), regression lines are fit by groups.

xlab

label for horizontal axis.

ylab

label for vertical axis.

log

same as the `log`

argument to `plot`

, to produce log axes.

jitter

a list with elements `x`

or `y`

or both, specifying jitter factors
for the horizontal and vertical coordinates of the points in the scatterplot. The
`jitter`

function is used to randomly perturb the points; specifying a
factor of `1`

produces the default jitter.
Fitted lines are unaffected by the jitter.

col

with no grouping, this specifies a color for plotted points;
with grouping, this argument should be a vector
of colors of length at least equal to the number of groups. The default is
value returned by `carPalette[-1]`

.

pch

plotting characters for points; default is the plotting characters in
order (see `par`

).

cex

sets the size of plotting characters, with `cex=1`

the standard size. You can also
set the sizes of other elements with the arguments `cex.axis`

, `cex.lab`

, `cex.main`

,
and `cex.sub`

. See `par`

.

reset.par

if `TRUE`

(the default) then plotting parameters are reset to their previous values
when `scatterplot`

exits; if `FALSE`

then the `mar`

and `mfcol`

parameters are
altered for the current plotting device. Set to `FALSE`

if you want to add graphical elements
(such as lines) to the plot.

…

other arguments passed down and to `plot`

. For example, the argument `las`

sets
the style of the axis labels, and `xlim`

and `ylim`

set the limits on the horizontal and
vertical axes, respectively; see `par`

.

If points are identified, their labels are returned; otherwise `NULL`

is returned invisibly.

Many arguments to `scatterplot`

were changed in version 3 of car to simplify use of
this function.

The `smooth`

argument is usually either set to `TRUE`

or `FALSE`

to draw, or omit,
the smoother. Alternatively `smooth`

can be set to a list of arguments. The default behavior of
`smooth=TRUE`

is equivalent to `smooth=list(smoother=loessLine, var=!by.groups, lty.var=2, lty.var=4)`

, specifying the smoother to be used, including the variance smooth,
and the line widths and types for the curves. You can also specify the colors you want to use for the mean and variance smooths with the arguments `col.smooth`

and `col.var`

. Alternative smoothers are `gamline`

which uses the
`gam`

function from the mgcv package, and `quantregLine`

which uses quantile regression to
estimate the median and quartile functions using `rqss`

from the quantreg package. All of these
smoothers have one or more arguments described on their help pages, and these arguments can be added to the
`smooth`

argument; for example, `smooth = list(span=1/2)`

would use the default
`loessLine`

smoother,
include the variance smooth, and change the value of the smoothing parameter to 1/2. For `loessLine`

and `gamLine`

the variance smooth is estimated by separately
smoothing the squared positive and negative
residuals from the mean smooth, using the same type of smoother. The displayed curves are equal to
the mean smooth plus the square root of the fit to the positive squared residuals, and the mean fit minus
the square root of the smooth of the negative squared residuals. The lines therefore represent the
comnditional variabiliity at each value on the horizontal axis. Because smoothing is done separately for
positive and negative residuals, the variation shown will generally not be symmetric about the fitted mean
function. For the `quantregLine`

method, the center estimates the median for each value on the
horizontal axis, and the variability estimates the lower and upper quartiles of the estimated conditional
distribution for each value of the horizontal axis.

The sub-arguments `spread`

, `lty.spread`

and `col.spread`

of the `smooth`

argument are equivalent to the newer `var`

, `col.var`

and `lty.var`

, respectively, recognizing that the spread is a measuure of conditional variability.

Fox, J. and Weisberg, S. (2019)
*An R Companion to Applied Regression*, Third Edition, Sage.

`boxplot`

,
`jitter`

, `legend`

,
`scatterplotMatrix`

, `dataEllipse`

, `Boxplot`

,
`cov.trob`

,
`showLabels`

, `ScatterplotSmoothers`

.

# NOT RUN { scatterplot(prestige ~ income, data=Prestige, ellipse=TRUE) scatterplot(prestige ~ income, data=Prestige, smooth=list(smoother=quantregLine)) # use quantile regression for median and quartile fits scatterplot(prestige ~ income | type, data=Prestige, smooth=list(smoother=quantregLine, var=TRUE, span=1, lwd=4, lwd.var=2)) scatterplot(prestige ~ income | type, data=Prestige, legend=list(coords="topleft")) scatterplot(vocabulary ~ education, jitter=list(x=1, y=1), data=Vocab, smooth=FALSE, lwd=3) scatterplot(infantMortality ~ ppgdp, log="xy", data=UN, id=list(n=5)) scatterplot(income ~ type, data=Prestige) # } # NOT RUN { # remember to exit from point-identification mode scatterplot(infantMortality ~ ppgdp, id=list(method="identify"), data=UN) # }