summaryPlot
can also provide
summaries of a single pollutant across many sites.summaryPlot(mydata, na.len = 24, clip = TRUE, percentile = 0.99,
type = "histogram", pollutant = "nox", period = "years",
breaks = NULL, col.trend = "darkgoldenrod2", col.data = "lightblue",
col.mis = rgb(0.65, 0.04, 0.07), col.hist = "forestgreen", cols = NULL,
date.breaks = 7, auto.text = TRUE, ...)
date
field and at least one other parameter.na.len
contiguous missing vales. The purpose of setting na.len
is
for clarity: with long time series it is difficult to see where
individual missing hours are. Furthermore, setticlip = TRUE
, will remove the top 1often a better display of the overall distribution of the data. The
amount of clippercentile = 0.99
(the default) will remove the top 1 percentile
of values i.e. values greater than the 99th percentile will not be used.type
is used to determine whether a histogram (the
default) or a density plot is used to show the distribution of the data.pollutant
is used when there is a field site
and there is more than one site in the data frame.period
is either year
(the default) or
month
. Statistics are calculated depending on the period
chosen.colors()
into R to see the full
range of colour names.colors()
into R to see the full range of colour names."greyscale"
.TRUE
(default) or FALSE
. If
TRUE
titles and axis labels will automatically try and format
pollutant names and units properly e.g. by subscripting the xlab
, ylab
and
main
), which are all passed to the plot via quickText
to handle
routine formatting.summaryPlot
produces two panels of plots: one showing the
presence/absence of data and the other the distributions. The left panel
shows time series and codes the presence or absence of data in different
colours. By stacking the plots one on top of another it is easy to compare
different pollutants/variables. Overall statistics are given for each
variable: mean, maximum, minimum, missing hours (also expressed as a
percentage), median and the 95th percentile. For each year the data capture
rate (expressed as a percentage of hours in that year) is also given.The right panel shows either a histogram or a density plot depending on the
choice of type
. Density plots avoid the issue of arbitrary bin sizes
that can sometimes provide a misleading view of the data distribution.
Density plots are often more appropriate, but their effectiveness will
depend on the data in question.
summaryPlot
will only show data that are numeric or integer type.
This is useful for checking that data have been imported properly. For
example, if for some reason a column representing wind speed erroneosly had
one or more fields with charcters in, the whole column would be either
character or factor type. The absence of a wind speed variable in the
summaryPlot
plot would therefore indicate a problem with the input
data. In this particular case, the user should go back to the source data
and remove the characters or remove them using R functions.
If there is a field site
, which would generally mean there is more
than one site, summaryPlot
will provide information on a
single pollutant across all sites, rather than provide details on
all pollutants at a single site. In this case the user should also
provide a name of a pollutant e.g. pollutant = "nox"
. If a pollutant
is not provided the first numeric field will automatically be chosen.
It is strongly recommended that the summaryPlot
function is
applied to all new imported data sets to ensure the data are imported as
expected.
# load example data from package
data(mydata)
# do not clip density plot data
summaryPlot(mydata, clip = FALSE)
# exclude highest 5 \% of data etc.
summaryPlot(mydata, percentile = 0.95)
# show missing data where there are at least 96 contiguous missing
# values (4 days)
summaryPlot(mydata, na.len = 96)
# show data in green
summaryPlot(mydata, col.data = "green")
# show missing data in yellow
summaryPlot(mydata, col.mis = "yellow")
# show density plot line in black
summaryPlot(mydata, col.dens = "black")
Run the code above in your browser using DataLab