# scatterplot

##### Enhanced Scatterplots with Marginal Boxplots, Point Marking, Smoothers, and More

This function uses basic R graphics to draw a two-dimensional scatterplot, with options to allow for plot enhancements that are often helpful with regression problems. Enhancements include adding marginal boxplots, estimated mean and variance functions using either parametric or nonparametric methods, point identification, jittering, setting characteristics of points and lines like color, size and symbol, marking points and fitting lines conditional on a grouping variable, and other enhancements.
`sp`

is an abbreviation for `scatterplot`

.

- Keywords
- hplot

##### Usage

`scatterplot(x, ...)`# S3 method for formula
scatterplot(formula, data, subset, xlab, ylab, id=FALSE,
legend=TRUE, ...)

# S3 method for default
scatterplot(x, y, boxplots=if (by.groups) "" else "xy",
regLine=TRUE, legend=TRUE, id=FALSE, ellipse=FALSE, grid=TRUE,
smooth=TRUE,
groups, by.groups=!missing(groups),
xlab=deparse(substitute(x)), ylab=deparse(substitute(y)),
log="", jitter=list(), cex=par("cex"),
col=carPalette()[-1], pch=1:n.groups,
reset.par=TRUE, ...)

sp(x, ...)

##### Arguments

- x
vector of horizontal coordinates (or first argument of generic function).

- y
vector of vertical coordinates.

- formula
a model formula, of the form

`y ~ x`

or, if plotting by groups,`y ~ x | z`

, where`z`

evaluates to a factor or other variable dividing the data into groups. If`x`

is a factor, then parallel boxplots are produced using the`Boxplot`

function.- data
data frame within which to evaluate the formula.

- subset
expression defining a subset of observations.

- boxplots
if

`"x"`

a marginal boxplot for the horizontal`x`

-axis is drawn below the plot; if`"y"`

a marginal boxplot for vertical`y`

-axis is drawn to the left of the plot; if`"xy"`

both marginal boxplots are drawn; set to`""`

or`FALSE`

to suppress both boxplots.- regLine
controls adding a fitted regression line to the plot. if

`regLine=FALSE`

, no line is drawn. If`TRUE`

, the default, an OLS line is fit. This argument can also be a list. The default of`TRUE`

is equivalent to`regLine=list(method=lm, lty=1, lwd=2, col=col)`

, which specifies using the`lm`

function to estimate the fitted line, to draw a solid line (`lty=1`

) of width 2 times the nominal width (`lwd=2`

) in the color given by the first element of the`col`

argument described below.- legend
when the plot is drawn by groups and

`legend=TRUE`

, controls placement and properties of a legend; if`FALSE`

, the legend is suppressed. Can be a list of named arguments, as follows:`title`

for the legend;`inset`

, giving space as a proportion of the axes to offset the legend from the axes;`coords`

specifying the position of the legend in any form acceptable to the`legend`

function or, if not given, the legend is placed*above*the plot in the upper margin;`columns`

for the legend, determined automatically to prefer a horizontal layout if not given explicitly;`cex`

giving the relative size of the legend symbols and text.`TRUE`

(the default) is equivalent to`list(title=deparse(substitute(groups)), inset=0.02, cex=1)`

.- id
controls point identification; if

`FALSE`

(the default), no points are identified; can be a list of named arguments to the`showLabels`

function;`TRUE`

is equivalent to`list(method="mahal", n=2, cex=1, col=carPalette()[-1], location="lr")`

, which identifies the 2 points (in each group) with the largest Mahalanobis distances from the center of the data. See`showLabels`

for a description of the other arguments. The default behavior of`id`

is not the same in all graphics functions in car, as the`method`

used depends on the type of plot.- ellipse
controls plotting data-concentration ellipses. If

`FALSE`

(the default), no ellipses are plotted. Can be a list of named values giving`levels`

, a vector of one or more bivariate-normal probability-contour levels at which to plot the ellipses;`robust`

, a logical value determing whether to use the`cov.trob`

function in the MASS package to calculate the center and covariance matrix for the data ellipses; and`fill`

and`fill.alpha`

, which control whether the ellipse is filled and the transparency of the fill.`TRUE`

is equivalent to`list(levels=c(.5, .95), robust=TRUE, fill=TRUE, fill.alpha=0.2)`

.- grid
If TRUE, the default, a light-gray background grid is put on the graph

- smooth
specifies a nonparametric estimate of the mean or median function of the vertical axis variable given the horizontal axis variable and optionally a nonparametric estimate of the conditional variance. If

`smooth=FALSE`

neither function is drawn. If`smooth=TRUE`

, then both the mean function and variance funtions are drawn for ungrouped data, and the mean function only is drawn for grouped data. The default smoother is`loessLine`

, which uses the`loess`

function from the stats package. This smoother is fast and reliable. See the details below for changing the smoother, line type, width and color, of the added lines, and adding arguments for the smoother.- groups
a factor or other variable dividing the data into groups; groups are plotted with different colors, plotting characters, fits, and smooths. Using this argument is equivalent to specifying the grouping variable in the formula.

- by.groups
if

`TRUE`

(the default when there are groups), regression lines are fit by groups.- xlab
label for horizontal axis.

- ylab
label for vertical axis.

- log
same as the

`log`

argument to`plot`

, to produce log axes.- jitter
a list with elements

`x`

or`y`

or both, specifying jitter factors for the horizontal and vertical coordinates of the points in the scatterplot. The`jitter`

function is used to randomly perturb the points; specifying a factor of`1`

produces the default jitter. Fitted lines are unaffected by the jitter.- col
with no grouping, this specifies a color for plotted points; with grouping, this argument should be a vector of colors of length at least equal to the number of groups. The default is value returned by

`carPalette[-1]`

.- pch
plotting characters for points; default is the plotting characters in order (see

`par`

).- cex
sets the size of plotting characters, with

`cex=1`

the standard size. You can also set the sizes of other elements with the arguments`cex.axis`

,`cex.lab`

,`cex.main`

, and`cex.sub`

. See`par`

.- reset.par
if

`TRUE`

(the default) then plotting parameters are reset to their previous values when`scatterplot`

exits; if`FALSE`

then the`mar`

and`mfcol`

parameters are altered for the current plotting device. Set to`FALSE`

if you want to add graphical elements (such as lines) to the plot.- …
other arguments passed down and to

`plot`

. For example, the argument`las`

sets the style of the axis labels, and`xlim`

and`ylim`

set the limits on the horizontal and vertical axes, respectively; see`par`

.

##### Details

Many arguments to `scatterplot`

were changed in version 3 of car to simplify use of
this function.

The `smooth`

argument is usually either set to `TRUE`

or `FALSE`

to draw, or omit,
the smoother. Alternatively `smooth`

can be set to a list of arguments. The default behavior of
`smooth=TRUE`

is equivalent to `smooth=list(smoother=loessLine, var=!by.groups, lty.var=2, lty.var=4)`

, specifying the smoother to be used, including the variance smooth,
and the line widths and types for the curves. You can also specify the colors you want to use for the mean and variance smooths with the arguments `col.smooth`

and `col.var`

. Alternative smoothers are `gamline`

which uses the
`gam`

function from the mgcv package, and `quantregLine`

which uses quantile regression to
estimate the median and quartile functions using `rqss`

from the quantreg package. All of these
smoothers have one or more arguments described on their help pages, and these arguments can be added to the
`smooth`

argument; for example, `smooth = list(span=1/2)`

would use the default
`loessLine`

smoother,
include the variance smooth, and change the value of the smoothing parameter to 1/2. For `loessLine`

and `gamLine`

the variance smooth is estimated by separately
smoothing the squared positive and negative
residuals from the mean smooth, using the same type of smoother. The displayed curves are equal to
the mean smooth plus the square root of the fit to the positive squared residuals, and the mean fit minus
the square root of the smooth of the negative squared residuals. The lines therefore represent the
comnditional variabiliity at each value on the horizontal axis. Because smoothing is done separately for
positive and negative residuals, the variation shown will generally not be symmetric about the fitted mean
function. For the `quantregLine`

method, the center estimates the median for each value on the
horizontal axis, and the variability estimates the lower and upper quartiles of the estimated conditional
distribution for each value of the horizontal axis.

The sub-arguments `spread`

, `lty.spread`

and `col.spread`

of the `smooth`

argument are equivalent to the newer `var`

, `col.var`

and `lty.var`

, respectively, recognizing that the spread is a measuure of conditional variability.

##### Value

If points are identified, their labels are returned; otherwise `NULL`

is returned invisibly.

##### References

Fox, J. and Weisberg, S. (2019)
*An R Companion to Applied Regression*, Third Edition, Sage.

##### See Also

`boxplot`

,
`jitter`

, `legend`

,
`scatterplotMatrix`

, `dataEllipse`

, `Boxplot`

,
`cov.trob`

,
`showLabels`

, `ScatterplotSmoothers`

.

##### Examples

```
# NOT RUN {
scatterplot(prestige ~ income, data=Prestige, ellipse=TRUE)
scatterplot(prestige ~ income, data=Prestige, smooth=list(smoother=quantregLine))
# use quantile regression for median and quartile fits
scatterplot(prestige ~ income | type, data=Prestige,
smooth=list(smoother=quantregLine, var=TRUE, span=1, lwd=4, lwd.var=2))
scatterplot(prestige ~ income | type, data=Prestige, legend=list(coords="topleft"))
scatterplot(vocabulary ~ education, jitter=list(x=1, y=1),
data=Vocab, smooth=FALSE, lwd=3)
scatterplot(infantMortality ~ ppgdp, log="xy", data=UN, id=list(n=5))
scatterplot(income ~ type, data=Prestige)
# }
# NOT RUN {
# remember to exit from point-identification mode
scatterplot(infantMortality ~ ppgdp, id=list(method="identify"), data=UN)
# }
```

*Documentation reproduced from package car, version 3.0-3, License: GPL (>= 2)*