timePlot: Plot time series

Description

Plot time series quickly, perhaps for multiple pollutants, grouped or in separate panels.

Usage

timePlot(mydata,
pollutant = "nox",
group = FALSE,
stack = FALSE,
normalise = NULL,
avg.time = "default",
data.thresh = 0,
statistic = "mean",
percentile = NA,
date.pad = FALSE,
type = "default",
layout = c(1, 1),
cols = "brewer1",
main = "",
ylab = pollutant,
plot.type = "l",
lty = 1:length(pollutant),
lwd = 1,
pch = NA,
key = TRUE,
strip = TRUE,
log = FALSE,
smooth = FALSE,
ci = TRUE,
ref.x = NULL,
ref.y = NULL,
key.columns = 1,
name.pol = pollutant,
date.breaks = 7,
auto.text = TRUE, ...)

Arguments

mydata

A data frame of time series. Must include a date field and at least one variable to plot.

pollutant

Name of variable to plot. Two or more pollutants can be plotted, in which case a form like pollutant = c("nox", "co") should be used.

group

If more than one pollutant is chosen, should they all be plotted on the same graph together? The default is FALSE, which means they are plotted in separate panels with their own scaled. If TRUE then they are plotte

stack

If TRUE the time series will be stacked by year. This option can be useful if there are several years worth of data making it difficult to see much detail when plotted on a single plot.

normalise

Should variables be normalised? The default is is not to normalise the data. normalise can take two values, either "mean" or a string representing a date in UK format e.g. "1/1/1998" (in the format dd/mm/YYYY). If

avg.time

This defines the time period to average to. Can be "sec", "min", "hour", "day", "DSTday", "week", "month", "quarter" or "year". For much increased flexibility a number can precede these options followed by a space. For example, a timeAverage of 2

data.thresh

The data capture threshold to use (%) when aggregating the data using avg.time. A value of zero means that all available data will be used in a particular period regardless if of the number of values available. Conversely, a value of

statistic

The statistic to apply when aggregating the data; default is the mean. Can be one of "mean", "max", "min", "median", "frequency", "sd", "percentile". Note that "sd" is the standard deviation and "frequency" is the number (frequency) of valid reco

percentile

The percentile level in % used when

statistic =
    "percentile"

and when aggregating the data with avg.time. More than one percentile level is allowed for type = "default" e.g. percentile = c(50, 95

date.pad

Should missing data be padded-out? This is useful where a data frame consists of two or more "chunks" of data with time gaps between them. By setting date.pad = TRUE the time gaps between the chunks are shown properly, rather than wi

type

type determines how the data are split i.e. conditioned, and then plotted. The default is will produce a single plot using the entire data. Type can be one of the built-in types as detailed in cutData e.g. "season"

layout

Determines how the panels are laid out. By default, plots will be shown in one column with the number of rows equal to the number of pollutants, for example. If the user requires 2 columns and two rows, layout should be set to layout

cols

Colours to be used for plotting. Options include "default", "increment", "heat", "spectral", "hue", "brewer1" (default) and user defined (see manual for more details). The same line colour can be set for all pollutant e.g. cols = "bla

main

The plot title; default is no title.

ylab

Name of y-axis variable. By default will use the name of pollutant(s).

plot.type

The lattice plot type, which is a line (plot.type = "l") by default. Another useful option is plot.type = "h", which draws vertical lines.

lty

The line type used for plotting. Default is to provide different line types for different pollutant. If one requires a continuous line for all pollutants, the set lty = 1, for example. See lty option for standard p

lwd

The line width used; default is 1. To set a wider line for all pollutant the choose, for example, lwd = 2. Alternatively, varying line widths can be chosen depending on the pollutant. For example, if pollutant = c("nox",

pch

The type of symbol to be plotted. The default is not to plot a symbol. It can be useful to do so in cases where there are not consecutive points in time and a line cannot be drawn between two points. Symbols can be plotted as a vector of types e.

key

Should a key be drawn? The default is TRUE.

strip

Should a strip be drawn? The default is TRUE.

log

Should the y-axis appear on a log scale? The default is FALSE. If TRUE a well-formatted log10 scale is used. This can be useful for plotting data for several different pollutants that exist on very different scales. I

smooth

Should a smooth line be applied to the data? The default is FALSE.

If a smooth fit line is applied, then ci determines whether the 95% confidence intervals aer shown.

ref.x

Add a vertical dashed reference line at this value.

ref.y

Add a horizontal dashed reference line at this value.

key.columns

Number of columns to be used in the key. With many pollutants a single column can make to key too wide. The user can thus choose to use several columns by setting columns to be less than the number of pollutants.

name.pol

This option can be used to give alternative names for the variables plotted. Instead of taking the column headings as names, the user can supply replacements. For example, if a column had the name "nox" and the user wanted a different description

date.breaks

Number of major x-axis intervals to use. The function will try and choose a sensible number of dates/times as well as formatting the date/time appropriately to the range being considered. This does not always work as desired automatically.

auto.text

Either TRUE (default) or FALSE. If TRUE titles and axis labels will automatically try and format pollutant names and units properly e.g. by subscripting the `2' in NO2.

...

Other graphical parameters passed onto lattice:xyplot and cutData. For example, in the case of cutData the option hemisphere = "southern".

Value

As well as generating the plot itself, timePlot also returns an object of class ``openair''. The object includes three main components: call, the command used to generate the plot; data, the data frame of summarised information used to make the plot; and plot, the plot itself. If retained, e.g. using output <- timePlot(mydata, "nox"), this output can be used to recover the data, reproduce or rework the original plot or undertake further analysis. An openair output can be manipulated using a number of generic operations, including print, plot and summary. See openair.generics for further details.

Details

The timePlot is the basic time series plotting function in openair. Its purpose is to make it quick and easy to plot time series for pollutants and other variables. The other purpose is to plot potentially many variables together in as compact a way as possible. The function is flexible enough to plot more than one variable at once. If more than one variable is chosen plots it can either show all variables on the same plot (with different line types) on the same scale, or (if group = FALSE) each variable in its own panels with its own scale. The general preference is not to plot two variables on the same graph with two different y-scales. It can be misleading to do so and difficult with more than two variables. If there is in interest in plotting several variables together that have very different scales, then it can be useful to normalise the data first, which can be down be setting the normalise option. The user has fine control over the choice of colours, line width and line types used. This is useful for example, to emphasise a particular variable with a specific line type/colour/width. timePlot works very well with selectByDate, which is used for selecting particular date ranges quickly and easily. See examples below. By default plots are shown with a colour key at the bottom and in teh case of multiple pollutants or sites, strips on teh left of each plot. Sometimes this may be overkill and the user can opt to remove the key and/or the strip by setting key and/or strip to FALSE. One reason to do this is to maximise the plotting area and therefore the information shown.

Examples

Run this code

# basic use, single pollutant
timePlot(mydata, pollutant = "nox")

# two pollutants in separate panels
timePlot(mydata, pollutant = c("nox", "no2"))

# two pollutants in the same panel with the same scale
timePlot(mydata, pollutant = c("nox", "no2"), group = TRUE)

# alternative by normalising concentrations and plotting on the same
  scale
timePlot(mydata, pollutant = c("nox", "co", "pm10", "so2"), group = TRUE, avg.time =
  "year", normalise = "1/1/1998", lwd = 3, lty = 1)

# examples of selecting by date

# plot for nox in 1999
timePlot(selectByDate(mydata, year = 1999), pollutant = "nox")

# select specific date range for two pollutants
timePlot(selectByDate(mydata, start = "6/8/2003", end = "13/8/2003"),
pollutant = c("no2", "o3"))

# choose different line styles etc
timePlot(mydata, pollutant = c("nox", "no2"), lty = 1)

# choose different line styles etc
timePlot(selectByDate(mydata, year = 2004, month = 6), pollutant =
c("nox", "no2"), lwd = c(1, 2), col = "black")

# different averaging times

#daily mean O3
timePlot(mydata, pollutant = "o3", avg.time = "day")

# daily mean O3 ensuring each day has data capture of at least 75\%
timePlot(mydata, pollutant = "o3", avg.time = "day", data.thresh = 75)

# 2-week average of O3 concentrations
timePlot(mydata, pollutant = "o3", avg.time = "2 week")

Run the code above in your browser using DataLab