Learn R Programming

lessR (version 1.9.8)

color.plot: Color Scatter-Plot, Function Plot, or Run Chart with Optional Dates

Description

Plots individual points, such as for a scatterplot, or plots connected line segments for the plot of a function or a run chart, including an option for adding dates to the horizontal axis for a time series chart. One enhancement over the usual plot function is the automatic inclusion of color. The color of the line segments and/or the points, background, area under the plotted line segments, gridlines, and border can each be explicitly specified, with default colors provided. For Likert style response data, so that each value has less than 10 unique integer values, the points in the plot are transformed into a bubble plot with the size of each bubble, i.e., point, determined by the corresponding joint frequency. The x-variable, the variable plotted on the horizontal axis, can be numerical or a factor.

For data exhibiting little trend, the the center line is provided for the generation of a run chart, plotting the values of a variable in order of occurrence over time. When the center line, the median by default, is plotted, the analyses of the number and composition of the individual runs, number of consecutive values above or below the center line, is also displayed. Also, the defaults change for each of the types of plots. The intent is to rely on the default values for a relatively sophisticated plot, particularly when compared to the default values of the standard R plot function.

If a scatterplot of two numeric variables is displayed, then the corresponding correlation coefficient as well as the hypothesis test of zero population correlation and the 95% confidence interval are also displayed. The same numeric values of the standard R function cor.test function are generated, though in a more readable format. Also, an option for the .95 data ellipse from the car package can enclose the points of the scatterplot.

Usage

color.plot(x, y=NULL, type=NULL, col.line="darkblue", col.area=NULL,  
           col.point="darkblue", col.fill=NULL, col.grid="grey90", 
           col.bg="ghostwhite", col.box="black", xy.ticks=TRUE, 
           xlab=NULL, ylab=NULL, pch=NULL, cex=NULL, center.line=NULL,
           kind=c("default", "regular", "bubble.freq", "sunflower.freq"),
           x.start=NULL, x.end=NULL, size=.25, text.out=TRUE,
           fit.line=c("none", "lowess", "ls"), col.fit.line="grey55", 
           col.bubble="lightsteelblue", col.flower="steelblue",
           time.start=NULL, time.by=NULL, time.reverse=FALSE,
           ellipse=FALSE, col.ellipse="lightslategray", fill.ellipse=TRUE, 
           ...)

Arguments

x
If both x and y are specified, then the x values are plotted on the horizontal axis. If x is not sorted, a scatter plot is produced. If x is sorted, then a function is plotted with a smooth line. If only x is specified with no y, then these x val
y
Coordinates of points in the plot on the vertical axis.
type
Character string that indicates the type of plot, either "p" for points, "l" for line, or "b" for both. If x and y are provided and x is sorted so that a function is plotted, the default is "l"
col.line
Color of any plotted line segments, with a default of "darkblue".
col.area
Color of area under the plotted line segments. To have a border at the bottom and right of a run chart but retain the property of no area color, specify a color of "transparent". If the values exhibit a trend and dates are specified
col.point
Color of the border of the plotted points.
col.fill
For plotted points, the interior color of the point. For a scatterplot the default value is transparent. For a run chart the default value is the color of the point's border, col.point.
col.grid
Color of the grid lines, with a default of "grey90".
col.bg
Color of the plot background.
col.box
Color of border around the plot background, the box, that encloses the plot, with a default of "black".
xy.ticks
Flag that indicates if tick marks and associated values on the axes are to be displayed.
xlab
Label for x-axis. For two variables specified, x and y, if xlab not specified, then the label becomes the name of the corresponding variable. If xy.ticks is FALSE, then no label is displayed. If no y variable is spec
ylab
Label for y-axis. If not specified, then the label becomes the name of the corresponding variable. If xy.ticks is FALSE, then no label displayed.
pch
The standard plot character, with values defined in help(points). The default value is 21, a circle with both a border and filled area, specified here as col.point and col.fill. For a scatterplot, col.fill
cex
Magnification factor for any displayed points, with default of cex=1.0.
center.line
Plots a dashed line through the middle of a run chart. The two possible values are "mean" and "median". Provides a centerline for the "median" by default when the values randomly vary about the mean.
kind
Default is "default", which becomes a "regular" scatterplot for most data. If Likert style response data is plotted, that is, each variable has less than 10 integer values, then instead by default a bubble plot is
x.start
For Likert style response data, the starting integer value of each axis. Useful if the actual data do not include all possible values.
x.end
For Likert style response data, the ending integer value of each axis. Useful if the actual data do not include all possible values.
size
Size of the bubbles in a bubble plot of Likert style data.
text.out
If TRUE, then display text output in console.
fit.line
The best fitting line. Default value is "none", with options for "lowess" and "ls".
col.fit.line
Color of the best fitting line, if the fit.line option is invoked.
col.bubble
Color of the bubbles if a bubble plot of the frequencies is plotted.
col.flower
Color of the flowers if a sunflower plot of the frequencies is plotted.
time.start
Optional starting date for first data value. Format must be "%Y-%m-%d" or "%Y/%m/%d". If using with x.reverse, the first date is after the data are reverse sorted.
time.by
Accompanies the x.start specification, the interval to increment the date for each sequential data value. A character string, containing one of "day", "week", "month" or "year".
time.reverse
When TRUE, reverse the ordering of the dates, particularly when the data are listed such that first row of data is the newest. Accompanies the time.start specification.
ellipse
If TRUE, enclose a scatterplot with the .95 data ellipse from the car package.
col.ellipse
Color of the ellipse.
fill.ellipse
If TRUE, fill the ellipse with a translucent shade of col.ellipse.
...
Other parameter values for graphics as defined by and then processed by plot and par, including xlim, ylim, lwd,

Details

ADAPTIVE GRAPHICS Results are based on the standard plot and related graphic functions, with the additional provided color capabilities and other options including a center line. The plotting procedure utilizes ``adaptive graphics'', such that color.plot chooses different default values for different characteristics of the specified plot and data values. The goal is to produce a desired graph from simply relying upon the default values, both of the color.plot function itself, as well as the base R functions called by color.plot, such as plot. Familiarity with the options permits complete control over the computed defaults, but this familiarity is intended to be optional for most situations.

TWO VARIABLE PLOT When two variables are specified to plot, by default if the values of the first variable, x, are unsorted, or if there are unequal intervals between adjacent values, or if there is missing data for either variable, a scatterplot is produced, that is, a call to the standard R plot function with type="p" for points. By default, sorted values with equal intervals between adjacent values of the first of the two specified variables yields a function plot if there is no missing data for either variable, that is, a call to the standard R plot function with type="l", which connects each adjacent pair of points with a line segment.

SCATTERPLOT ELLIPSE For a scatterplot of two numeric variables, the ellipse=TRUE option draws the .95 data ellipse as computed by the dataEllipse function, written by Georges Monette and John Fox, from the car package. Usually the minimum and maximum values of the axes should be manually extended beyond their default to accomodate the entire ellipse. To accomplish this extension, use the xlim and ylim options, such as xlim=c(30,350). Obtaining the desired axes limits may involve multiple runs of the color.plot function. To provide more control over the display of the data ellipse beyond the provided col.ellipse and fill.ellipse options, run the dataEllipse function directly with the plot.points=FALSE option following color.plot with ellipse=FALSE, the default.

ONE VARIABLE PLOT Specifying one variable leads to a run chart, with the values on the horizontal axis automatically generated. The default is the Index variable, the ordinal position of each data value. Or, dates on the horizontal axis can be specified from the specified starting date given by x.start and the accompanying increment as given by x.by. If the data values randomly vary about the mean, the default is to plot the mean as the center line of the graph, otherwise the default is to ignore the center line. The default plot type for the run chart is type="b", for both points and the corresponding connected line segments. The size of the points is automatically reduced according to the number of points of points plotted, and the cex option can override the computed default. If the area below the plotted values is specified to be filled in with color, then the default line type changes to type="l".

LIKERT DATA A scatterplot of Likert type data is problematic because there are so few possibilities for points in the scatterplot. For example, for a scatterplot of two five-point Likert response data, there are only 25 possible paired values to plot, so most of the plotted points overlap with others. In this situation, that is, when there are less than 10 values for each of the two variables, a bubble plot is automatically provided, with the size of each point relative to the joint frequency of the paired data values. A sunflower plot can be requested in lieu of the bubble plot.

BACKGROUND COLOR The default background color of col.bg=ghostwhite provides a very mild cool tone with a slight emphasis on blue. To provide a warmer tone by slightly enhancing red, try col.bg=snow. Obtain a very light gray with col.bg=gray99. To darken the background gray, try col.bg=gray97 or lower numbers. See the color.show function in this package the provides an example of all available named colors.

ADDITIONAL OPTIONS Commonly used graphical parameters that are available to the standard R function color.plot are generally available to color.plot, such as:

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

References

Monette, G. and Fox, J., dataEllipse function from the car package.

See Also

plot, title, par, cor.test.

Examples

Run this code
# scatter plot
x <- rnorm(25)
y <- rnorm(25)
# default scatterplot, x is not sorted so type is set to "p"
color.plot(x, y)
# compare to standard R plot
plot(x, y)
# scatterplot, with ellipse and extended axes to accomodate the ellipse
color.plot(x, y, ellipse=TRUE, xlim=c(-3,3), ylim=c(-3,3))
# scatterplot, with lowess line added
color.plot(x, y, fit.line="lowess")
# custom scatter plot
color.plot(x, y, col.point="darkred", col.fill="plum")

# scatterplot of Likert data, 1 to 7 scale, and get bubble plot
# size of each plotted point (bubble) depends on its joint frequency
# triggered by default when  < 10 unique values for each variable
x1 <- sample(1:7, size=100, replace=TRUE)
x2 <- sample(1:7, size=100, replace=TRUE)
color.plot(x1,x2)
# compare to usual scatterplot of Likert data
color.plot(x1,x2, kind="regular")
# plot Likert data and get sunflower plot with lowess line
color.plot(x1,x2, kind="sunflower.freq", fit.line="lowess")

# scatterplot of continuous Y against categorical X, a factor
Pain <- sample(c("None", "Some", "Much", "Massive"), size=25, replace=TRUE)
Pain <- factor(Pain, levels=c("None", "Some", "Much", "Massive"), ordered=TRUE)
Cost <- round(rnorm(25,1000,100),2)
color.plot(Pain, Cost)
# for this purpose, improved version of standard R stripchart
stripchart(Cost ~ Pain, vertical=TRUE)

# function curve
x <- seq(10,500,by=1) 
y <- 18/sqrt(x)
# x is sorted with equal intervals so type set to "l" for line
color.plot(x, y)
# custom function plot
color.plot(x, y, ylab="My Y", xlab="My X", col.line="blue", 
  col.bg="snow", col.area="lightsteelblue", col.grid="lightsalmon")
  
# generate data randomly varying about a constant mean
y <- rnorm(25)
# default run chart
color.plot(y)
# compare to standard R plot
plot(y, type="l")
# customize run chart, pch=24: filled triangle point-up,
color.plot(y, lwd=2, col.point="sienna3", pch=24, 
  col.bg="mintcream", ylim=c(-3.5,3.5), center.line="median")
  
# generate steadily increasing values
y <- sort(rexp(50))
# default line chart
color.plot(y)
# line chart with border around plotted values
color.plot(y, col.area="transparent")
# time series chart, i.e., with dates, and filled area
# with option label for the x-axis
color.plot(y, time.start="2005/09/01", time.by="month")

# modern art
n <- sample(2:30, size=1)
x <- rnorm(n)
y <- rnorm(n)
clr <- colors()
color1 <- clr[sample(1:length(clr), size=1)]
color2 <- clr[sample(1:length(clr), size=1)]
color.plot(x, y, type="l", lty="dashed", lwd=3, col.area=color1, 
   col.line=color2, xy.ticks=FALSE, main="Modern Art", 
   cex.main=2, col.main="lightsteelblue", kind="regular")

Run the code above in your browser using DataLab