Genderates scatterplots for one or two variables. For two variables a scatterplot is produced accompanyied by the analysis of the correlation coefficient. For a data frame, a scatterplot matrix and correlation matrix are produced for all numeric variables in the data frame. If the values of the first specified value are sorted, then points are connected via line segments. The first variable can be numeric or a factor. The second variable must be numeric. For Likert style response data of two variables, so that each value has less than 10 unique integer values, the points in the plot are transformed into a bubble plot with the size of each bubble, i.e., point, determined by the corresponding joint frequency. An alternate name for
ScatterPlot is just
One enhancement over the standard R
plot function is the automatic inclusion of color. The color of the line segments and/or the points, background, area under the plotted line segments, grid lines, and border can each be explicitly specified, with default colors provided by one of the pre-defined color themes as defined by the
If a scatterplot of two numeric variables is displayed, then the corresponding correlation coefficient as well as the hypothesis test of zero population correlation and the 95% confidence interval are also displayed. The same numeric values of the standard R function
cor.test function are generated, though in a more readable format. Also, an option for the .95 data ellipse from John Fox's
car package can enclose the points of the scatterplot.
For one variable, based on the standard R function
stripchart, plots a one dimensional scatterplot, that is, a dot chart, also called a strip chart. Also identifies outliers according to the criteria specified by a box plot and displays the summary statistics for the variable. The dot plot is also invoked with the function names
DotPlot or just
dp, which are just alternate names for
ScatterPlot when a single variable is referenced.
ScatterPlot(x, y=NULL, by=NULL, dframe=mydata, type=NULL, n.cat=getOption("n.cat"),
col.pts=NULL, col.fill=NULL, trans.pts=getOption("trans.pts"), shape.pts="circle",
col.line=NULL, col.area=NULL, col.box="black", col.grid=NULL, col.bg=NULL, colors=c("blue", "gray", "rose", "green", "gold", "red"),
cex.axis=.85, col.axis="gray30", col.ticks="gray30", xy.ticks=TRUE, xlab=NULL, ylab=NULL, main=NULL, cex=NULL, x.start=NULL, x.end=NULL, y.start=NULL, y.end=NULL, time.start=NULL, time.by=NULL, time.reverse=FALSE,
kind=c("default", "regular", "bubble", "sunflower"),
fit.line=c("none", "loess", "ls"), col.fit.line="grey55",
col.bubble=NULL, bubble.size=.25, col.flower=NULL,
ellipse=FALSE, col.ellipse="lightslategray", fill.ellipse=TRUE,
pt.reg="circle", pt.out="circle", col.out30="firebrick2", col.out15="firebrick4", new=TRUE,
pdf.file=NULL, pdf.width=5, pdf.height=5, ...)
"l"for line, or
"b"for both. If x and y are provided and x is sorted so that a function is plotted, the default is
points. The default value is 21, a circle with both a border and filled area, specified here with
xlabnot specified, then the label becomes the name of the corresponding variable. If
FALSE, then no label is displayed. If no y v
FALSE, then no label displayed.
mylabels, then the title is set by default from the corresponding variable labels.
x.reverse, the first date is after the data are reverse sorted. Not needed if data are a time series with
time.startspecification, the interval to increment the date for each sequential data value. A character string, containing one of
TRUE, reverse the ordering of the dates, particularly when the data are listed such that first row of data is the newest. Accompanies the
"default", which becomes a
"regular"scatterplot for most data. If Likert style response data is plotted, that is, each variable has less than 10 integer values, then instead by default a bubble plot is
"none", with options for
fit.lineoption is invoked.
TRUE, enclose a scatterplot with the .95 data ellipse from the car package.
TRUE, fill the ellipse with a translucent shade of
TRUE, then display text output in console window.
FALSE, then add the dot plot to an existing graph.
mydata. If this data frame is named something different, then specify the name with the
dframeoption. Regardless of its name, the data frame need not be attached to reference the variable directly by its name, that is, no need to invoke the
mydata$namenotation. If two variables are specified, both variables should be in the data frame, or one of the variables is in the data frame and the other in the user's workspace, the global environment.
Results for two variablesare based on the standard
plot and related graphic functions, with the additional provided color capabilities and other options including a center line. The plotting procedure utilizes ``adaptive graphics'', such that
ScatterPlot chooses different default values for different characteristics of the specified plot and data values. The goal is to produce a desired graph from simply relying upon the default values, both of the
ScatterPlot function itself, as well as the base R functions called by
ScatterPlot, such as
plot. Familiarity with the options permits complete control over the computed defaults, but this familiarity is intended to be optional for most situations.
TWO VARIABLE PLOT
When two variables are specified to plot, by default if the values of the first variable,
x, are unsorted, or if there are unequal intervals between adjacent values, or if there is missing data for either variable, a scatterplot is produced, that is, a call to the standard R
plot function with
type="p" for points. By default, sorted values with equal intervals between adjacent values of the first of the two specified variables yields a function plot if there is no missing data for either variable, that is, a call to the standard R
plot function with
type="l", which connects each adjacent pair of points with a line segment.
A variable specified with
by= is a grouping variable that specifies that the plot is produced with the points for each group plotted with a different shape and/or color. By default, the shapes vary by group, and the color of the plot symbol remains the same for the groups. The default shapes, in this order, are
"triup" for a triangle pointed up, and
"tridown" for a triangle pointed down.
To explicitly vary the shapes, use
shape.pts and a list of shape values in the standard R form with the
c function to combine a list of values, one specified shape for each group, as shown in the examples. To explicitly vary the colors, use
col.pts, such as with R standard color names. If
col.pts is specified without
shape.pts, then colors are varied, but not shapes. To vary both shapes and colors, specify values for both options, always with one shape or color specified for each level of the
Shapes beyond the standard list of named shapes, such as
"circle", are also available as single characters. Any single letter, uppercase or lowercase, any single digit, and the characters
"#" are available, as illustrated in the examples. In the use of
shape.pts, either use standard named shapes, or individual characters, but not both in a single specification.
For a scatterplot of two numeric variables, the
ellipse=TRUE option draws the .95 data ellipse as computed by the
dataEllipse function, written by Georges Monette and John Fox, from the
car package. Usually the minimum and maximum values of the axes should be manually extended beyond their default to accommodate the entire ellipse. To accomplish this extension, use the
ylim options, such as
xlim=c(30,350). Obtaining the desired axes limits may involve multiple runs of the
ScatterPlot function. To provide more control over the display of the data ellipse beyond the provided
fill.ellipse options, run the
dataEllipse function directly with the
plot.points=FALSE option following
ellipse=FALSE, the default.
ONE VARIABLE PLOT
The one variable plot is a one variable scatterplot, that is, a dot chart. Results are based on the standard
stripchart function. Colors are provided by default and can also be specified.
MULTIPLE VARIABLE PLOT
If the variable,
x is a data frame, then the data frame must contain only numeric variables. If not, the first non-numeric variable is noted and the procedure ends. Otherwise, the procedure generates the scatterplot matrix with the R
pairs function as well as the correlation matrix of all the variables in the data frame with the R
LIKERT DATA A scatterplot of Likert type data is problematic because there are so few possibilities for points in the scatterplot. For example, for a scatterplot of two five-point Likert response data, there are only 25 possible paired values to plot, so most of the plotted points overlap with others. In this situation, that is, when there are less than 10 values for each of the two variables, a bubble plot is automatically provided, with the size of each point relative to the joint frequency of the paired data values. A sunflower plot can be requested in lieu of the bubble plot.
Although standard R does not provide for variable labels,
lessR can store the labels in a data frame called
mylabels, obtained from the
Read function. If this labels data frame exists, then the corresponding variable label is by default listed as the label for the corresponding axis and on the text output. For more information, see
The default background color of
col.bg=ghostwhite provides a very mild cool tone with a slight emphasis on blue. The entire color theme can be specified at the system level with the
set using the
colors option. Or, use the same option for
ScatterPlot to set the color theme just for one scatterplot. The default color theme is
blue, but a gray scale is available with
"gray", and other themes are available as explained in the
help function for
Colors can also be changed for individual aspects of a scatterplot as well. To provide a warmer tone by slightly enhancing red, try
col.bg=snow. Obtain a very light gray with
col.bg=gray99. To darken the background gray, try
col.bg=gray97 or lower numbers. See the
showColors which provides an example of all available named colors.
Because of the customized graphic windowing system that maintains a unique graphic window for the Help function, the standard graphic output functions such as
lessR graphics functions. Instead, to obtain pdf output, use the
pdf.file option, perhaps with the optional
pdf.height options. These files are written to the default working directory, which can be explicitly specified with the R
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
dataEllipsefunction from the
# scatterplot # create simulated data, no population mean difference # X has two values only, Y is numeric # put into a data frame, required for formula version n <- 12 f <- sample(c("Group1","Group2"), size=n, replace=TRUE) x <- round(rnorm(n=n, mean=50, sd=10), 2) y <- round(rnorm(n=n, mean=50, sd=10), 2) z <- round(rnorm(n=n, mean=50, sd=10), 2) mydata <- data.frame(f,x,y,z) rm(f); rm(x); rm(y); rm(z) # default scatterplot, x is not sorted so type is set to "p" # although data not attached, access each variable directly by its name ScatterPlot(x, y) # short name sp(x,y) # compare to standard R plot, which requires the mydata$ notation plot(mydata$x, mydata$y) # save scatterplot to a pdf file ScatterPlot(x, y, pdf.file="MyScatterScatterPlot.pdf") # scatterplot, with ellipse and extended axes to accommodate the ellipse ScatterPlot(x, y, ellipse=TRUE, xlim=c(20,80), ylim=c(20,80)) # scatterplot, with loess line ScatterPlot(x, y, fit.line="loess") # increase span (smoothing) from default of .75 ScatterPlot(x, y, fit.line="loess", span=1.25) # custom scatterplot ScatterPlot(x, y, col.pts="darkred", col.fill="plum") # scatterplot with a gray scale color theme ScatterPlot(x, y, colors="gray") # by variable scatterplot with default point color, vary shapes ScatterPlot(x,y, by=f) # by variable scatterplot with custom colors, keeps only 1 shape ScatterPlot(x,y, by=f, col.pts=c("hotpink", "steelblue")) # by variable with characters for plotting symbols # reduce the size of the plotted symbols with cex<1 ScatterPlot(x, y, by=f, shape.pts=c("F","M"), cex=.6) # vary both shape and color ScatterPlot(x, y, by=f, col.pts=c("hotpink", "steelblue"), shape.pts=c("F","M")) # by variable dot plot with custom colors, keeps only 1 shape ScatterPlot(x, by=f, col.pts=c("hotpink", "steelblue")) # bubble plot of simulated Likert data, 1 to 7 scale # size of each plotted point (bubble) depends on its joint frequency # triggered by default when < 10 unique values for each variable x1 <- sample(1:7, size=100, replace=TRUE) x2 <- sample(1:7, size=100, replace=TRUE) ScatterPlot(x1,x2) # compare to usual scatterplot of Likert data, transparency helps plot(x1,x2) ScatterPlot(x1,x2, kind="regular", cex=3, trans.pts=.7) # plot Likert data and get sunflower plot with loess line ScatterPlot(x1,x2, kind="sunflower", fit.line="loess") # scatterplot of continuous Y against categorical X, a factor Pain <- sample(c("None", "Some", "Much", "Massive"), size=25, replace=TRUE) Pain <- factor(Pain, levels=c("None", "Some", "Much", "Massive"), ordered=TRUE) Cost <- round(rnorm(25,1000,100),2) ScatterPlot(Pain, Cost) # for this purpose, improved version of standard R stripchart stripchart(Cost ~ Pain, vertical=TRUE) # function curve x <- seq(10,500,by=1) y <- 18/sqrt(x) # x is sorted with equal intervals so type set to "l" for line ScatterPlot(x, y) # custom function plot ScatterPlot(x, y, ylab="My Y", xlab="My X", col.line="blue", col.bg="snow", col.area="lightsteelblue", col.grid="lightsalmon") # Default dot plot ScatterPlot(y) # can also specify DotPlot(y) or dp(y) DotPlot(y) # Dot plot with custom colors for outliers ScatterPlot(y, pt.reg=23, col.out15="hotpink", col.out30="darkred") # modern art n <- sample(2:30, size=1) x <- rnorm(n) y <- rnorm(n) clr <- colors() color1 <- clr[sample(1:length(clr), size=1)] color2 <- clr[sample(1:length(clr), size=1)] ScatterPlot(x, y, type="l", lty="dashed", lwd=3, col.area=color1, col.line=color2, xy.ticks=FALSE, main="Modern Art", cex.main=2, col.main="lightsteelblue", kind="regular", n.cat=0) # ----------------------------------------------- # variables in a different data frame than mydata # ----------------------------------------------- # variables of interest are in a data frame which is not the default mydata # although data not attached, access the variable directly by its name data(datEmployee) ScatterPlot(Years, Salary, by=Gender, dframe=datEmployee)
Run the code above in your browser using DataCamp Workspace