sp
Generates scatter plots for one or two variables. For two variables also produces an analysis of the correlation coefficient. If the values of the first specified value are sorted, then points are connected via line segments. The first variable can be numeric or a factor. The second variable must be numeric. For Likert style response data of two variables, so that each value has less than 10 unique integer values, the points in the plot are transformed into a bubble plot with the size of each bubble, i.e., point, determined by the corresponding joint frequency. An alternate name for ScatterPlot
is just Plot
.
One enhancement over the standard R plot
function is the automatic inclusion of color. The color of the line segments and/or the points, background, area under the plotted line segments, grid lines, and border can each be explicitly specified, with default colors provided by one of the pre-defined color themes as defined by the set
function.
If a scatterplot of two numeric variables is displayed, then the corresponding correlation coefficient as well as the hypothesis test of zero population correlation and the 95% confidence interval are also displayed. The same numeric values of the standard R function cor.test
function are generated, though in a more readable format.
For one variable, based on the standard R function stripchart
, plots a one dimensional scatterplot, that is, a dot chart, also called a strip chart. Also identifies outliers according to the criteria specified by a box plot and displays the summary statistics for the variable. The dot plot is also invoked with the function names DotPlot
or just dp
, which are just alternate names for ScatterPlot
when a single variable is referenced.
When output is assigned into an object, such as p
in p <- sp(Y)
, the pieces of output can be accessed for later analysis. A primary such analysis is knitr
for dynamic report generation in which R output is embedded in documents, facilitated by the knitr.file
option. See value
below.
ScatterPlot(x, y=NULL, by=NULL, data=mydata, type=NULL, n.cat=getOption("n.cat"),
knitr.file=NULL, digits.d=NULL, col.fill=getOption("col.fill.pt"),
col.stroke=getOption("col.stroke.pt"),
col.bg=getOption("col.bg"),
col.grid=getOption("col.grid"),
col.area=NULL, col.box="black",
shape.pts="circle", cex.axis=.85, col.axis="gray30",
xy.ticks=TRUE,
xlab=NULL, ylab=NULL, main=NULL, cex=NULL,
kind=c("default", "regular", "bubble", "sunflower"),
fit.line=c("none", "loess", "ls"), col.fit.line="grey55",
bubble.size=.25, method="overplot",
ellipse=FALSE,
pt.reg="circle", pt.out="circle",
col.out30="firebrick2", col.out15="firebrick4", new=TRUE,
diag=FALSE, col.diag=par("fg"), lines.diag=TRUE,
quiet=getOption("quiet"),
pdf.file=NULL, pdf.width=5, pdf.height=5,
fun.call=NULL, ...)
sp(...)
Plot(...)
DotPlot(...)
dp(...)
kind="regular"
mydata
."p"
for
points, "l"
for line, or "b"
for both. If x and y are provided and
x is sorted so that a function is plotted, the default is "
col.stroke
. Does not
apply if there is a by
variable, which relies upon the default.by
variable,
specified as a vector, one value for each level of by
."grey90"
.points
.
The default value is 21, a circle with both a border and filled area, specified here
with col.pts
and col.fill
"black"
.xlab
not
specified, then the label becomes the name of the corresponding variable. If
xy.ticks
is FALSE
, then no label is displayed. If no y vxy.ticks
is FALSE
, then no label displayed."default"
, which becomes a "regular"
scatterplot for
most data. If Likert style response data is plotted, that is,
each variable has less than 10 integer values, then instead by default a bubble plot is "none"
, with options for
"loess"
and "ls"
.fit.line
option is invoked."overplot"
, but can also provide "stack"
to stack the points or
"jigger"
to scramble the points.TRUE
, enclose a scatterplot with the .95 data ellipse from the car package.FALSE
, then add the dot plot to an existing graph.TRUE
, then add
a diagonal line to a 2-dimensional scatter plot.diag=TRUE
.lines.diag=TRUE
, then if diag=TRUE
, each point is
connected to the diagonal line with a line segment.TRUE
, no text output. Can change system default
with set
function.knitr
to pass the function call when
obtained from the abbreviated function call sp
.mydata
. Specify another name with the data
option. Regardless of its name, the data frame need not be attached to reference the variables directly by its name, that is, no need to invoke the mydata$name
notation. The referenced variables can be in the data frame and/or the user's workspace, the global environment. ADAPTIVE GRAPHICS
Results for two variables are based on the standard plot
and related graphic functions, with the additional provided color capabilities and other options including a center line. The plotting procedure utilizes ``adaptive graphics'', such that ScatterPlot
chooses different default values for different characteristics of the specified plot and data values. The goal is to produce a desired graph from simply relying upon the default values, both of the ScatterPlot
function itself, as well as the base R functions called by ScatterPlot
, such as plot
. Familiarity with the options permits complete control over the computed defaults, but this familiarity is intended to be optional for most situations.
TWO VARIABLE PLOT
When two variables are specified to plot, by default if the values of the first variable, x
, are unsorted, or if there are unequal intervals between adjacent values, or if there is missing data for either variable, a scatterplot is produced, that is, a call to the standard R plot
function with type="p"
for points. By default, sorted values with equal intervals between adjacent values of the first of the two specified variables yields a function plot if there is no missing data for either variable, that is, a call to the standard R plot
function with type="l"
, which connects each adjacent pair of points with a line segment.
BY VARIABLE
A variable specified with by=
is a grouping variable that specifies that the plot is produced with the points for each group plotted with a different shape and/or color. By default, the shapes vary by group, and the color of the plot symbol remains the same for the groups. The default shapes, in this order, are "circle"
, "diamond"
, "square"
, "triup"
for a triangle pointed up, and "tridown"
for a triangle pointed down.
To explicitly vary the shapes, use shape.pts
and a list of shape values in the standard R form with the c
function to combine a list of values, one specified shape for each group, as shown in the examples. To explicitly vary the colors, use col.pts
, such as with R standard color names. If col.pts
is specified without shape.pts
, then colors are varied, but not shapes. To vary both shapes and colors, specify values for both options, always with one shape or color specified for each level of the by
variable.
Shapes beyond the standard list of named shapes, such as "circle"
, are also available as single characters. Any single letter, uppercase or lowercase, any single digit, and the characters "+"
, "*"
and "#"
are available, as illustrated in the examples. In the use of shape.pts
, either use standard named shapes, or individual characters, but not both in a single specification.
ONE VARIABLE PLOT
The one variable plot is a one variable scatterplot, that is, a dot chart. Results are based on the standard stripchart
function. Colors are provided by default and can also be specified. For gray scale output, potential outliers are plotted with squares and actual outliers are plotted with diamonds, otherwise shades of red are used to highlight outliers. The definition of outliers are from the R boxplot
function.
LIKERT DATA A scatterplot of Likert type data is problematic because there are so few possibilities for points in the scatterplot. For example, for a scatterplot of two five-point Likert response data, there are only 25 possible paired values to plot, so most of the plotted points overlap with others. In this situation, that is, when there are less than 10 values for each of the two variables, a bubble plot is automatically provided, with the size of each point relative to the joint frequency of the paired data values. A sunflower plot can be requested in lieu of the bubble plot.
DIAGONAL
Useful particularly when comparing pre- and post- scores on some assessement, a diagonal line that runs from the lower-left corner of the graph to the upper-right corner represents the values of no change from a value on the x-axis that equals the corresponding value on the y-axis, where the pre and post scores are equal. Points on either side of that diagonal indicate +
or -
change. To provide this line, specify diag=TRUE
, which will apply only to scatter plots with two numeric, non-categorical, variables. When so specified, for each data coordinate, a vertical line is drawn from the diagonal of no change to the point, unless lines.diag
is set to FALSE
. If diag=TRUE
, then the axes limits are set so that each axis has the same beginning and ending point.
VARIABLE LABELS
Although standard R does not provide for variable labels, lessR
can store the labels in the data frame with the data, obtained from the Read
function. If this labels data frame exists, then the corresponding variable label is by default listed as the label for the corresponding axis and on the text output. For more information, see Read
.
COLORS
Individual colors in the plot can be manipulated with options such as col.bars
for the color of the histogram bars. A color theme for all the colors can be chosen for a specific plot with the colors
option with the lessR
function set
. The default color theme is dodgerblue
, but a gray scale is available with "gray"
, and other themes are available as explained in set
, such as "red"
and "green"
. Use the option ghost=TRUE
for a black background, no grid lines and partial transparency of plotted colors.
Colors can also be changed for individual aspects of a scatterplot as well. To provide a warmer tone by slightly enhancing red, try col.bg=snow
. Obtain a very light gray with col.bg=gray99
. To darken the background gray, try col.bg=gray97
or lower numbers. See the lessR
function showColors
which provides an example of all available named colors.
PDF OUTPUT
Because of the customized graphic windowing system that maintains a unique graphic window for the Help function, the standard graphic output functions such as pdf
do not work with the lessR
graphics functions. Instead, to obtain pdf output, use the pdf.file
option, perhaps with the optional pdf.width
and pdf.height
options. These files are written to the default working directory, which can be explicitly specified with the R setwd
function.
ADDITIONAL OPTIONS
Commonly used graphical parameters that are available to the standard R function plot
are also generally available to ScatterPlot
, such as:
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],ONLY VARIABLES ARE REFERENCED
The referenced variable in a lessR
function can only be a variable name. This referenced variable must exist in either the referenced data frame, such as the default mydata
, or in the user's workspace, more formally called the global environment. That is, expressions cannot be directly evaluated. For example:
> ScatterPlot(rnorm(50), rnorm(50)) # does NOT work}
Instead, do the following: > X <- rnorm(50) # create vector X in user workspace > Y <- rnorm(50) # create vector Y in user workspace > ScatterPlot(X,Y) # directly reference X and Y
dataEllipse
function from the car
package.
Gerbing, D. W. (2013). R Data Analysis without Programming, Chapter 8, NY: Routledge.
[object Object],[object Object]
plot
, stripchart
, title
, par
, Correlation
, set
.
# default scatterplot, x is not sorted so type is set to "p" # although data not attached, access each variable directly by its name ScatterPlot(x, y)
# short name sp(x,y)
# compare to standard R plot, which requires the mydata$ notation plot(mydata$x, mydata$y)
# save scatterplot to a pdf file ScatterPlot(x, y, pdf.file="MyScatterScatterPlot.pdf")
# scatterplot, with loess line ScatterPlot(x, y, fit.line="loess")
# increase span (smoothing) from default of .75 # span is a loess parameter and generates a caution that can be # ignored that it is not a graphical parameter -- we know that #ScatterPlot(x, y, fit.line="loess", span=1.25)
# custom scatterplot, with diagonal line, connecting line segments ScatterPlot(x, y, col.stroke="darkred", col.fill="plum", diag=TRUE)
# scatterplot with a gray scale color theme # or, use set(colors="gray") to invoke for all subsequent analyses # until reset back to default color of "blue" set(colors="gray") ScatterPlot(x, y) set(colors="blue")
# by variable scatterplot with default point color, vary shapes ScatterPlot(x,y, by=Gender)
# by variable scatterplot with custom colors, keeps only 1 shape ScatterPlot(x,y, by=Gender, col.stroke=c("steelblue", "hotpink"))
# by variable with values of Gender for plotting symbols # reduce the size of Gender the plotted symbols with cex<1 scatterplot(x,="" y,="" by="Gender," shape.pts="c("M","F")," cex=".6)
# vary both shape and color ScatterPlot(x, y, by=Gender, col.stroke=c("steelblue", "hotpink"), shape.pts=c("M","F"))
# Default dot plot ScatterPlot(y)
# Dot plot with custom colors for outliers ScatterPlot(y, pt.reg=23, col.out15="hotpink", col.out30="darkred")
# one variable scatterplot with added jitter of points ScatterPlot(x, method="jitter", jitter=0.05)
# by variable dot plot with custom colors, keeps only 1 shape ScatterPlot(x, by=Gender, col.stroke=c("steelblue", "hotpink"))
# bubble plot of simulated Likert data, 1 to 7 scale # size of each plotted point (bubble) depends on its joint frequency # triggered by default when < 10 unique values for each variable x1 <- sample(1:7, size=100, replace=TRUE) x2 <- sample(1:7, size=100, replace=TRUE) ScatterPlot(x1,x2)
# compare to usual scatterplot of Likert data, transparency helps plot(x1,x2) ScatterPlot(x1,x2, kind="regular", cex=3)
# plot Likert data and get sunflower plot with loess line ScatterPlot(x1,x2, kind="sunflower", fit.line="loess")
# scatterplot of continuous Y against categorical X, a factor Pain <- sample(c("None", "Some", "Much", "Massive"), size=25, replace=TRUE) Pain <- factor(Pain, levels=c("None", "Some", "Much", "Massive"), ordered=TRUE) Cost <- round(rnorm(25,1000,100),2) ScatterPlot(Pain, Cost)
# for this purpose, improved version of standard R stripchart stripchart(Cost ~ Pain, vertical=TRUE)
# function curve x <- seq(10,500,by=1) y <- 18/sqrt(x) # x is sorted with equal intervals so type set to "l" for line # can use Plot or ScatterPlot, here Plot seems more appropriate Plot(x, y) # custom function plot Plot(x, y, ylab="My Y", xlab="My X", col.stroke="blue", col.bg="snow", col.area="lightsteelblue", col.grid="lightsalmon")
# modern art n <- sample(2:30, size=1) x <- rnorm(n) y <- rnorm(n) clr <- colors() color1 <- clr[sample(1:length(clr), size=1)] color2 <- clr[sample(1:length(clr), size=1)] ScatterPlot(x, y, type="l", lty="dashed", lwd=3, col.area=color1, col.stroke=color2, xy.ticks=FALSE, main="Modern Art", cex.main=2, col.main="lightsteelblue", kind="regular", n.cat=0)
# ----------------------------------------------- # variables in a different data frame than mydata # -----------------------------------------------
# variables of interest are in a data frame which is not the default mydata
# although data not attached, access the variable directly by its name
data(dataEmployee)
ScatterPlot(Years, Salary, by=Gender, data=dataEmployee)