polarPlot
); identifying clusters in the original
data for subsequent processing.polarCluster(mydata, pollutant = "nox", x = "ws", wd = "wd",
n.clusters = 6, cols = "Paired", angle.scale = 315, units = x,
auto.text = TRUE, ...)
wd
, another variable to plot in polar coordinates
(the default is a column date
if plots by
time period are requirpollutant = "nox"
. Only one
pollutant can be chosen.n.clusters
is more than length 1, then a
lattice
panel plot will be output showing the
clusters identified for each one of n.clusters
.RColorBrewer
colours --- see the openair
openColours
function for more details. Useful
schemes include angle.scale
to another
value (between 0 and 360 degrees) to mitTRUE
(default) or
FALSE
. If TRUE
titles and axis labels will
automatically try and format pollutant names and units
properly e.g. by subscripting the `2' in NO2.polarPlot
, lattice:levelplot
and
cutData
. Common axis and title labelling options
(such as xlab
, ylab
, main
) are
passed via
polarCluster
also returns an object of class ``openair''. The object
includes three main components: call
, the command
used to generate the plot; data
, the original data
frame with a new field cluster
identifying the
cluster; and plot
, the plot itself. Note that any
rows where the value of pollutant
is NA
are
ignored so that the returned data frame may have fewer rows
than the original.An openair output can be manipulated using a number of
generic operations, including print
, plot
and
summary
. See openair.generics
for
further details.
polarPlot
function provide a very useful graphical technique for
identifying and characterising different air pollution
sources. While bivariate polar plots provide a useful
graphical indication of potential sources, their location
and wind-speed or other variable dependence, they do have
several limitations. Often, a `feature' will be detected in
a plot but the subsequent analysis of data meeting
particular wind speed/direction criteria will be based only
on the judgement of the investigator concerning the wind
speed-direction intervals of interest. Furthermore, the
identification of a feature can depend on the choice of the
colour scale used, making the process somewhat arbitrary.polarCluster
applies Partition Around Medoids (PAM)
clustering techniques to polarPlot
surfaces to help
identify potentially interesting features for further
analysis. Details of PAM can be found in the cluster
package (a core R package that will be pre-installed on all
R systems). PAM clustering is similar to k-means but has
several advantages e.g. is more robust to outliers. The
clustering is based on the equal contribution assumed from
the u and v wind components and the associated
concentration. The data are standardized before clustering
takes place.
The function works best by first trying different numbers
of clusters and plotting them. This is achieved by setting
n.clusters
to be of length more than 1. For example,
if n.clusters = 2:10
then a plot will be output
showing the 9 cluster levels 2 to 10.
Note that clustering is computationally intensive and the
function can take a long time to run --- particularly when
the number of clusters is increased. For this reason it can
be a good idea to run a few clusters first to get a feel
for it e.g. n.clusters = 2:5
.
Once the number of clusters has been decided, the user can
then run polarCluster
to return the original data
frame together with a new column cluster
, which
gives the cluster number as a character (see example). Note
that any rows where the value of pollutant
is
NA
are ignored so that the returned data frame may
have fewer rows than the original.
Note that there are no automatic ways in ensuring the most appropriate number of clusters as this is application dependent. However, there is often a-priori information available on what different features in polar plots correspond to. Nevertheless, the appropriateness of different clusters is best determined by post-processing the data. The Carslaw and Beevers (2012) paper discusses these issues in more detail.
Note that unlike most other openair
functions only a
single type
Carslaw, D.C., & Beevers, S.D. (2013). Characterising and understanding emission sources using bivariate polar plots and k-means clustering. Environmental Modelling & Software, 40, 325-329. doi:10.1016/j.envsoft.2012.09.005
polarPlot
# load example data from package
data(mydata)
## plot 2-8 clusters. Warning! This can take several minutes...
\dontrun{
polarCluster(mydata, pollutant = "nox", n.clusters = 2:8)
}
# basic plot with 6 clusters
results <- polarCluster(mydata, pollutant = "nox", n.clusters = 6)
## get results, could read into a new data frame to make it easier to refer to
## e.g. results <- results$data...
head(results$data)
## how many points are there in each cluster?
table(results$data$cluster)
## plot clusters 3 and 4 as a timeVariation plot using SAME colours as in
## cluster plot
timeVariation(subset(results$data, cluster %in% c("3", "4")), pollutant = "nox",
group = "cluster", col = openColours("Paired", 6)[c(3, 4)])
Run the code above in your browser using DataLab