summaryPlot
can also provide
summaries of a single pollutant across many sites.summaryPlot(mydata, na.len = 24, clip = TRUE,
percentile = 0.99, type = "histogram",
pollutant = "nox", period = "years", breaks = NULL,
col.trend = "darkgoldenrod2", col.data = "lightblue",
col.mis = rgb(0.65, 0.04, 0.07),
col.hist = "forestgreen", cols = NULL, date.breaks = 7,
auto.text = TRUE, ...)
date
field and at least one other parameter.na.len
contiguous missing vales. The
purpose of setting na.len
is for clarity: with
long time series it is difficult to see where individual
missing hours are. Furtheclip = TRUE
, will
remove the top 1 better display of the overall distribution of the data.
The amount opercentile = 0.99
(the default) will
remove the top 1 percentile of values i.e. values greater
than the 99th percentile will not be used.type
is used to determine whether a
histogram (the default) or a density plot is used to show
the distribution of the data.pollutant
is used when there is a
field site
and there is more than one site in the
data frame.period
is either year
(the
default) or month
. Statistics are calculated
depending on the period
chosen.colors()
into R to see the full range of colour
names.colors()
into R to
see the full range of colour names."greyscale"
.TRUE
(default) or
FALSE
. If TRUE
titles and axis labels will
automatically try and format pollutant names and units
properly e.g. by subscripting the xlab
, ylab
and main
), which
are all passed to the plot via quickText
to handle
routinesummaryPlot
produces two panels of plots: one
showing the presence/absence of data and the other the
distributions. The left panel shows time series and codes
the presence or absence of data in different colours. By
stacking the plots one on top of another it is easy to
compare different pollutants/variables. Overall
statistics are given for each variable: mean, maximum,
minimum, missing hours (also expressed as a percentage),
median and the 95th percentile. For each year the data
capture rate (expressed as a percentage of hours in that
year) is also given.
The right panel shows either a histogram or a density
plot depending on the choice of type
. Density
plots avoid the issue of arbitrary bin sizes that can
sometimes provide a misleading view of the data
distribution. Density plots are often more appropriate,
but their effectiveness will depend on the data in
question.
summaryPlot
will only show data that are numeric
or integer type. This is useful for checking that data
have been imported properly. For example, if for some
reason a column representing wind speed erroneosly had
one or more fields with charcters in, the whole column
would be either character or factor type. The absence of
a wind speed variable in the summaryPlot
plot
would therefore indicate a problem with the input data.
In this particular case, the user should go back to the
source data and remove the characters or remove them
using R functions.
If there is a field site
, which would generally
mean there is more than one site, summaryPlot
will
provide information on a single pollutant across
all sites, rather than provide details on all pollutants
at a single site. In this case the user should
also provide a name of a pollutant e.g. pollutant =
"nox"
. If a pollutant is not provided the first numeric
field will automatically be chosen.
It is strongly recommended that the
summaryPlot
function is applied to all new
imported data sets to ensure the data are imported as
expected.# load example data from package
data(mydata)
# do not clip density plot data
summaryPlot(mydata, clip = FALSE)
# exclude highest 5 \% of data etc.
summaryPlot(mydata, percentile = 0.95)
# show missing data where there are at least 96 contiguous missing
# values (4 days)
summaryPlot(mydata, na.len = 96)
# show data in green
summaryPlot(mydata, col.data = "green")
# show missing data in yellow
summaryPlot(mydata, col.mis = "yellow")
# show density plot line in black
summaryPlot(mydata, col.dens = "black")
Run the code above in your browser using DataLab