Learn R Programming

openair (version 0.4-10)

scatterPlot: Flexible scatterPlots

Description

Scatter plots with conditioning and three main approaches: conventional scatterPlot, hexagonal binning and kernel density estimates. The former also has options for fitting smooth fits and linear models with uncertainties shown.

Usage

scatterPlot(mydata,
x = "nox",
y = "no2",
method = "scatter",
group = NULL,
avg.time = "default",
data.thresh = 0,
statistic = "mean",
percentile = NA,
type = "default",
layout = NULL,
smooth = TRUE,
spline = FALSE,
linear = FALSE,
ci = TRUE,
mod.line = FALSE,
cols = "hue",
main = "",
ylab = y,
xlab = x,
pch = 1,
lwd = 1,
lty = 1,
plot.type = "p", 
key = TRUE,
key.title = group,
key.columns = 1,
strip = TRUE,
log.x = FALSE,
log.y = FALSE,
y.relation = "same",
x.relation = "same",
ref.x = NULL,
ref.y = NULL,
nbin = 256,
continuous = FALSE,
trans = TRUE,
auto.text = TRUE,
...)

Arguments

mydata
A data frame containing at least two numeric variables to plot.
x
Name of the x-variable to plot. Note that x can be a date field or a factor. For example, x can be one of the openair built in types such as "year" or "season".
y
Name of the numeric y-variable to plot.
method
Methods include "scatter" (conventional scatter plot), "hexbin" (hexagonal binning using the hexbin package) and "density" (2D kernel density estimates).
group
The grouping variable to use, if any. Setting this to a variable in the data frame has the effect of plotting several series in the same panel using different symbols/colours etc. If set to a variable that is a character or factor, those categori
avg.time
This defines the time period to average to. Can be "sec", "min", "hour", "day", "DSTday", "week", "month", "quarter" or "year". For much increased flexibility a number can precede these options followed by a space. For example, a timeAverage of 2
data.thresh
The data capture threshold to use (%) when aggregating the data using avg.time. A value of zero means that all available data will be used in a particular period regardless if of the number of values available. Conversely, a value of
statistic
The statistic to apply when aggregating the data; default is the mean. Can be one of "mean", "max", "min", "median", "frequency", "sd", "percentile". Note that "sd" is the standard deviation and "frequency" is the number (frequency) of valid r
percentile
The percentile level in % used when statistic = "percentile" and when aggregating the data with avg.time. The default is 95. Not used if avg.time = "default".
type
type determines how the data are split i.e. conditioned, and then plotted. The default is will produce a single plot using the entire data. Type can be one of the built-in types as detailed in cutData e.g. "season"
layout
Determines how the panels are laid out. By default, plots will be shown in one column with the number of rows equal to the number of pollutants, for example. If the user requires 2 columns and two rows, layout should be set to layout
smooth
A smooth line is fitted to the data if TRUE; optionally with 95% confidence intervals shown.
spline
A smooth spline is fitted to the data if TRUE. This is particularly useful when there are fewer data points or when a connection line between a sequence of points is required.
linear
A linear model is fitted to the data if TRUE; optionally with 95% confidence intervals shown. The equation of the line and R2 value is also shown.
ci
Should the confidence intervals for the smooth/linear fit be shown?
mod.line
If TRUE three lines are added to the scatter plot to help inform model evaluation. The 1:1 line is solid and the 1:0.5 and 1:2 lines are dashed. Together these lines help show how close a group of points are to a 1:1 relationship and al
cols
Colours to be used for plotting. Options include "default", "increment", "heat", "spectral", "hue", "brewer1" and user defined (see manual for more details). The same line colour can be set for all pollutant e.g. cols = "black"
main
The plot title; default is no title.
ylab
Name of y-axis variable. By default will use the name of y.
xlab
Name of x-axis variable. By default will use the name of x.
pch
The symbol type used for plotting. Default is to provide different symbol types for different pollutant. If one requires a single symbol for all pollutants, the set pch = 1, for example.
lwd
Width of line if used e.g. if plot.type = "l" or plot.type = "b".
lty
Type of line if used e.g. if plot.type = "l" or plot.type = "b".
plot.type
lattice plot type. Can be "p" (points --- default), "l" (lines) or "b" (lines and points).
key
Should a key be drawn? The default is TRUE.
key.title
The title of the key (if used).
key.columns
Number of columns to be used in the key. With many pollutants a single column can make to key too wide. The user can thus choose to use several columns by setting columns to be less than the number of pollutants.
strip
Should a strip be drawn? The default is TRUE.
log.x
Should the x-axis appear on a log scale? The default is FALSE. If TRUE a well-formatted log10 scale is used. This can be useful for checking linearity once logged.
log.y
Should the y-axis appear on a log scale? The default is FALSE. If TRUE a well-formatted log10 scale is used. This can be useful for checking linearity once logged.
y.relation
This determines how the y-axis scale is plotted. "same" ensures all panels use the same scale and "free" will use panel-specfic scales. The latter is a useful setting when plotting data with very different values.
x.relation
This determines how the y-axis scale is plotted. "same" ensures all panels use the same scale and "free" will use panel-specfic scales. The latter is a useful setting when plotting data with very different values.
ref.x
Add a vertical dashed reference line at this value.
ref.y
Add a horizontal dashed reference line at this value.
nbin
Number of bins used for kernel density output using method "density".
continuous
When this option is TRUE a plot of x vs. y will be made, colour-coded by levels of group, provided group is a numeric variable. A continuous separate colour scale is shown. If continuous = FALSE an
trans
trans is used when continuous = TRUE. Often for a good colour scale with skewed data it is a good idea to "compress" the scale. If TRUE a square root transform is used, if FALSE a linear scale i
auto.text
Either TRUE (default) or FALSE. If TRUE titles and axis labels will automatically try and format pollutant names and units properly e.g. by subscripting the `2' in NO2.
...
Other graphical parameters passed onto lattice:xyplot and cutData. For example, in the case of cutData the option hemisphere = "southern".

Value

  • As well as generating the plot itself, scatterPlot also returns an object of class ``openair''. The object includes three main components: call, the command used to generate the plot; data, the data frame of summarised information used to make the plot; and plot, the plot itself. If retained, e.g. using output <- scatterPlot(mydata, "nox", "no2"), this output can be used to recover the data, reproduce or rework the original plot or undertake further analysis. An openair output can be manipulated using a number of generic operations, including print, plot and summary. See openair.generics for further details.

Details

The scatterPlot is the basic function for plotting scatterPlots in flexible ways in openair. It is flexible enough to consider lots of conditioning variables and takes care of fitting smooth or linear relationships to the data. There are three main ways of plotting the relationship between two variables, which are set using the method option. The default "scatter" will plot a conventional scatterPlot. In cases where there are lots of data and over-plotting becomes a problem, then method = "hexbin" or method = "density" can be useful. The former requires the hexbin package to be installed. By default a smooth fit is shown as this can help show the overall form of the data e.g. whether the relationship appears to be linear or not. Also, a linear fit can be shown using linear = TRUE as an option. The user has fine control over the choice of colours and symbol type used. Another way of reducing the number of points used in the plots which can sometimes be useful is to aggregate the data. For example, hourly data can be aggregated to daily data. See timePlot for examples here. By default plots are shown with a colour key at the bottom and in the case of conditioning, strips on the top of each plot. Sometimes this may be overkill and the user can opt to remove the key and/or the strip by setting key and/or strip to FALSE. One reason to do this is to maximise the plotting area and therefore the information shown.

See Also

linearRelation, timePlot and timeAverage for details on selecting averaging times and other statistics in a flexible way

Examples

Run this code
# load openair data if not loaded already
data(mydata)

# basic use, single pollutant
scatterPlot(mydata, x = "nox", y = "no2")

# scatterPlot by year
scatterPlot(mydata, x = "nox", y = "no2", type = "year")

# scatterPlot by day of the week, removing key at bottom
scatterPlot(mydata, x = "nox", y = "no2", type = "weekday", key =
FALSE)

# example of the use of continuous where colour is used to show
# different levels of a third (numeric) variable
# plot daily averages and choose a filled plot symbol (pch = 16)
# select only 2004
dat2004 <- selectByDate(mydata, year = 2004)
scatterPlot(dat2004, x = "nox", y = "no2", group = "co", continuous =
 TRUE, avg.time = "day", pch = 16)

# show linear fit, by year
scatterPlot(mydata, x = "nox", y = "no2", type = "year", smooth =
FALSE, linear = TRUE)

# do the same, but for daily means...
scatterPlot(mydata, x = "nox", y = "no2", type = "year", smooth =
FALSE, linear = TRUE, avg.time = "day")

# log scales
scatterPlot(mydata, x = "nox", y = "no2", type = "year", smooth =
FALSE, linear = TRUE, avg.time = "day", log.x = TRUE, log.y = TRUE)

# also works with the x-axis in date format (alternative to timePlot)
scatterPlot(mydata, x = "date", y = "no2", avg.time = "month",
key = FALSE)

## multiple types and grouping variable and continuous colour scale
scatterPlot(mydata, x = "nox", y = "no2", type = c("season", "weekend"),
group = "o3", continuous = TRUE)

# use hexagonal binning
library(hexbin)
# basic use, single pollutant
scatterPlot(mydata, x = "nox", y = "no2", method = "hexbin")

# scatterPlot by year
scatterPlot(mydata, x = "nox", y = "no2", type = "year", method =
"hexbin")

Run the code above in your browser using DataLab