gam
smoothing
is carried out using mgcv
.polarPlot(mydata, pollutant = "nox", x = "ws", wd = "wd",
type = "default", statistic = "mean", resolution = "fine",
limits = NA, exclude.missing = TRUE, uncertainty = FALSE,
percentile = NA, cols = "default", weights = c(0.25, 0.5, 0.75),
min.bin = 1, mis.col = "grey", alpha = 1, upper = NA,
angle.scale = 315, units = x, force.positive = TRUE, k = 100,
normalise = FALSE, key.header = "", key.footer = pollutant,
key.position = "right", key = TRUE, auto.text = TRUE, ws_spread = 15,
wd_spread = 4, kernel = "gaussian", tau = 0.5, ...)
wd
, another
variable to plot in polar coordinates (the default is a column
“ws” --- wind speed) and a pollutant. Should also contain
date
if plots by time period are required.pollutant
= "nox"
. There can also be more than one pollutant specified
e.g. pollutant = c("nox", "no2")
. The main use of using
two or more pollutants is for model evaluation where two species
would be expected to have similar concentrations. This saves the
user stacking the data and it is possible to work with columns of
data directly. A typical use would be pollutant = c("obs",
"mod")
to compare two columns “obs” (the observations)
and “mod” (modelled values). When pair-wise statistics
such as Pearson correlation and regression techniques are to be
plotted, pollutant
takes two elements too. For example,
pollutant = c("bc", "pm25")
where "bc"
is a
function of "pm25"
.type
determines how the data are split i.e.
conditioned, and then plotted. The default is will produce a
single plot using the entire data. Type can be one of the
built-in types as detailed in cutData
e.g.
“season”, “year”, “weekday” and so on. For
example, type = "season"
will produce four plots --- one
for each season.It is also possible to choose type
as another variable in
the data frame. If that variable is numeric, then the data will
be split into four quantiles (if possible) and labelled
accordingly. If type is an existing character or factor variable,
then those categories/levels will be used directly. This offers
great flexibility for understanding the variation of different
variables and how they depend on one another.
Type can be up length two e.g. type = c("season",
"weekday")
will produce a 2x2 plot split by season and day of
the week. Note, when two types are provided the first forms the
columns and the second the rows.
statistic = "weighted.mean"
where the bin mean is multiplied by the bin frequency and divided
by the total frequency. In many cases using polarFreq
will
be better. Setting statistic = "weighted.mean"
can be
useful because it provides an indication of the concentration *
frequency of occurrence and will highlight the wind
speed/direction conditions that dominate the overall mean.Can be:
statistic = "cpf"
the conditional probability
function (CPF) is plotted and a single (usually high) percentile
level is supplied. The CPF is defined as CPF = my/ny, where my is
the number of samples in the y bin (by default a wind direction,
wind speed interval) with mixing ratios greater than the
overall percentile concentration, and ny is the total
number of samples in the same wind sector (see Ashbaugh et al.,
1985). Note that percentile intervals can also be considered; see
percentile
for details.
statistic = "r"
, the Pearson correlation
coefficient is calculated for two pollutants. The
calculation involves a weighted Pearson correlation coefficient,
which is weighted by Gaussian kernels for wind direction an the
radial variable (by default wind speed). More weight is assigned
to values close to a wind speed-direction interval. Kernel
weighting is used to ensure that all data are used rather than
relying on the potentially small number of values in a wind
speed-direction interval.
"robust.slope"
is another option for pair-wise
statisitics and "quantile.slope"
, which uses quantile
regression to estimate the slope for a particular quantile level
(see also tau
for setting the quantile level). c(lower, upper)
, so limits = c(0, 100)
would force the plot limits to span 0-100.TRUE
(the
default) removes points from the plot that are too far from the
original data. The smoothing routines will produce predictions at
points where no data exist i.e. they predict. By removing the
points too far from the original data produces a plot where it is
clear where the original data lie. If set to FALSE
missing
data will be interpolated.TRUE
three plots are produced on the same
scale showing the predicted surface together with the estimated
lower and upper uncertainties at the 95
Calculating the uncertainties is useful to understand whether
features are real or not. For example, at high wind speeds where
there are few data there is greater uncertainty over the
predicted values. The uncertainties are calculated using the GAM
and weighting is done by the frequency of measurements in each
wind speed-direction bin. Note that if uncertainties are
calculated then the type is set to "default".statistic = "percentile"
then
percentile
is used, expressed from 0 to 100. Note that the
percentile value is calculated in the wind speed, wind direction
‘bins’. For this reason it can also be useful to set
min.bin
to ensure there are a sufficient number of points
available to estimate a percentile. See quantile
for more
details of how percentiles are calculated.percentile
is also used for the Conditional Probability
Function (CPF) plots. percentile
can be of length two, in
which case the percentile interval is considered for use
with CPF. For example, percentile = c(90, 100)
will plot
the CPF for concentrations between the 90 and 100th percentiles.
Percentile intervals can be useful for identifying specific
sources. In addition, percentile
can also be of length 3.
The third value is the ‘trim’ value to be applied. When
calculating percentile intervals many can cover very low values
where there is no useful information. The trim value ensures that
values greater than or equal to the trim * mean value are
considered before the percentile intervals are calculated.
The effect is to extract more detail from many source signatures.
See the manual for examples. Finally, if the trim value is less
than zero the percentile range is interpreted as absolute
concentration values and subsetting is carried out directly.
RColorBrewer
colours --- see the openair
openColours
function for more details. For user defined
the user can supply a list of colour names recognised by R (type
colours()
to see the full list). An example would be
cols = c("yellow", "green", "blue")
. cols
can also
take the values "viridis"
, "magma"
,
"inferno"
, or "plasma"
which are the viridis colour
maps ported from Python's Matplotlib library.weights
applies a weighting to reduce their influence. For
example and by default if only a single data point exists then
the weighting factor is 0.25 and for two points 0.5. To not apply
any weighting and use the data as is, use weights = c(1, 1,
1)
.An alternative to down-weighting these points they can be removed
altogether using min.bin
.
polarFreq
function can be of use in such
circumstances.min.bin
is > 1 it can be useful to show
where data are removed on the plots. This is done by shading the
missing data in mis.col
. To not highlight missing data
when min.bin
> 1 choose mis.col = "transparent"
.openairmapss
.angle.scale
to another value (between 0 and 360 degrees)
to mitigate such problems. For example angle.scale = 45
will draw the scale heading in a NE direction.TRUE
. Sometimes if
smoothing data with steep gradients it is possible for predicted
values to be negative. force.positive = TRUE
ensures that
predictions remain positive. This is useful for several reasons.
First, with lots of missing data more interpolation is needed and
this can result in artifacts because the predictions are too far
from the original data. Second, if it is known beforehand that
the data are all positive, then this option carries that
assumption through to the prediction. The only likely time where
setting force.positive = FALSE
would be if background
concentrations were first subtracted resulting in data that is
legitimately negative. For the vast majority of situations it is
expected that the user will not need to alter the default option.gam
function in package mgcv
. Typically, value of around 100
(the default) seems to be suitable and will resolve important
features in the plot. The most appropriate choice of k
is
problem-dependent; but extensive testing of polar plots for many
different problems suggests a value of k
of about 100 is
suitable. Setting k
to higher values will not tend to
affect the surface predictions by much but will add to the
computation time. Lower values of k
will increase
smoothing. Sometimes with few data to plot polarPlot
will
fail. Under these circumstances it can be worth lowering the
value of k
.TRUE
concentrations are normalised by
dividing by their mean value. This is done after fitting
the smooth surface. This option is particularly useful if one is
interested in the patterns of concentrations for several
pollutants on different scales e.g. NOx and CO. Often useful if
more than one pollutant
is chosen.key.header = "header",
key.footer = "footer1"
adds addition text above and below the
scale key. These arguments are passed to drawOpenKey
via
quickText
, applying the auto.text
argument, to
handle formatting.key.footer
."top"
, "right"
,
"bottom"
and "left"
.drawOpenKey
.
See drawOpenKey
for further details.TRUE
(default) or FALSE
. If
TRUE
titles and axis labels will automatically try and
format pollutant names and units properly e.g. by subscripting
the `2' in NO2.15
.4
."gaussian"
is supported but this may be enhanced in the
future.statistic
is
set to "quantile.slope"
. Default is 0.5
which is
equal to the median and will be ignored if
"quantile.slope"
is not used.lattice:levelplot
and cutData
. For example,
polarPlot
passes the option hemisphere = "southern"
on to cutData
to provide southern (rather than default
northern) hemisphere handling of type = "season"
.
Similarly, common axis and title labelling options (such as
xlab
, ylab
, main
) are passed to
levelplot
via quickText
to handle routine
formatting.polarPlot
also returns an object of class ``openair''. The object includes
three main components: call
, the command used to generate
the plot; data
, the data frame of summarised information
used to make the plot; and plot
, the plot itself. If
retained, e.g. using output <- polarPlot(mydata, "nox")
,
this output can be used to recover the data, reproduce or rework
the original plot or undertake further analysis. An openair output can be manipulated using a number of generic
operations, including print
, plot
and
summary
. polarPlot
surface data can also be extracted directly
using the results
, e.g. results(object)
for
output <- polarPlot(mydata, "nox")
. This returns a data
frame with four set columns: cond
, conditioning based on
type
; u
and v
, the translational vectors
based on ws
and wd
; and the local pollutant
estimate.mgcv
package that uses Generalized Additive
Models. While methods do exist to find an optimum level of
smoothness, they are not necessarily useful. The principal aim of
polarPlot
is as a graphical analysis rather than for
quantitative purposes. In this respect the smoothing aims to strike
a balance between revealing interesting (real) features and overly
noisy data. The defaults used in polarPlot
are based on the
analysis of data from many different sources. More advanced users
may wish to modify the code and adopt other smoothing approaches. Various statistics are possible to consider e.g. mean, maximum,
median. statistic = "max"
is often useful for revealing
sources. Pair-wise statistics between two pollutants can also be
calculated. The function can also be used to compare two pollutant species
through a range of pair-wise statistics (see help on
statistic
) and Grange et al. (2016) (open-access publication
link below). Wind direction is split up into 10 degree intervals and the other
variable (e.g. wind speed) 30 intervals. These 2D bins are then
used to calculate the statistics. These plots often show interesting features at higher wind speeds
(see references below). For these conditions there can be very few
measurements and therefore greater uncertainty in the calculation
of the surface. There are several ways in which this issue can be
tackled. First, it is possible to avoid smoothing altogether and
use polarFreq
in the package openair
. Second, the
effect of setting a minimum number of measurements in each wind
speed-direction bin can be examined through min.bin
. It is
possible that a single point at high wind speed conditions can
strongly affect the surface prediction. Therefore, setting
min.bin = 3
, for example, will remove all wind
speed-direction bins with fewer than 3 measurements before
fitting the surface. Third, consider setting uncertainty =
TRUE
. This option will show the predicted surface together with
upper and lower 95
frequency of measurements. Variants on polarPlot
include polarAnnulus
and
polarFreq
.
# Use openair 'mydata'
# basic plot
polarPlot(openair::mydata, pollutant = "nox")
## Not run: ------------------------------------
#
# # polarPlots by year on same scale
# polarPlot(mydata, pollutant = "so2", type = "year", main = "polarPlot of so2")
#
# # set minimum number of bins to be used to see if pattern remains similar
# polarPlot(mydata, pollutant = "nox", min.bin = 3)
#
# # plot by day of the week
# polarPlot(mydata, pollutant = "pm10", type = "weekday")
#
# # show the 95% confidence intervals in the surface fitting
# polarPlot(mydata, pollutant = "so2", uncertainty = TRUE)
#
#
# # Pair-wise statistics
# # Pearson correlation
# polarPlot(mydata, pollutant = c("pm25", "pm10"), statistic = "r")
#
# # Robust regression slope, takes a bit of time
# polarPlot(mydata, pollutant = c("pm25", "pm10"), statistic = "robust.slope")
#
# # Least squares regression works too but it is not recommended, use robust
# # regression
# # polarPlot(mydata, pollutant = c("pm25", "pm10"), statistic = "slope")
#
## ---------------------------------------------
Run the code above in your browser using DataLab