summaryPlot
can also provide summaries of a single
pollutant across many sites.summaryPlot(mydata, na.len = 24, clip = TRUE, percentile = 0.99,
type = "histogram", pollutant = "nox", period = "years",
avg.time = "day", print.datacap = TRUE, breaks = NULL,
col.trend = "darkgoldenrod2", col.data = "lightblue",
col.mis = rgb(0.65, 0.04, 0.07), col.hist = "forestgreen", cols = NULL,
date.breaks = 7, auto.text = TRUE, ...)
date
field and at least one other parameter.na.len
contiguous missing vales. The purpose of
setting na.len
is for clarity: with long time series it
is difficult to see where individual missing hours are.
Furthermore, setting na.len = 96
, for example would show
where there are at least 4 days of continuous missing data.clip = TRUE
, will remove the top 1
yield what is often a better display of the overall distribution
of the data. The amount of clipping can be set with
percentile
.percentile = 0.99
(the default) will remove the top 1
percentile of values i.e. values greater than the 99th
percentile will not be used.type
is used to determine whether a histogram
(the default) or a density plot is used to show the distribution
of the data.pollutant
is used when there is a field
site
and there is more than one site in the data frame.period
is either years
(the default)
or months
. Statistics are calculated depending on the
period
chosen.avg.time = "2 month"
.colors()
into R
to see the full range of colour names.colors()
into R to see the full range of
colour names."greyscale"
.date.breaks
up or down.TRUE
(default) or FALSE
. If
TRUE
titles and axis labels will automatically try and
format pollutant names and units properly e.g. by subscripting
the ‘2’ in NO2.xlab
, ylab
and main
), which are all passed
to the plot via quickText
to handle routine formatting.
As summaryPlot
has two components, the axis labels may be
a vector. For example, the default case (type =
"histogram"
) sets y labels equivalent to ylab = c("",
"Percent of Total")
.summaryPlot
produces two panels of plots: one showing the
presence/absence of data and the other the distributions. The left
panel shows time series and codes the presence or absence of data
in different colours. By stacking the plots one on top of another
it is easy to compare different pollutants/variables. Overall
statistics are given for each variable: mean, maximum, minimum,
missing hours (also expressed as a percentage), median and the
95th percentile. For each year the data capture rate (expressed as
a percentage of hours in that year) is also given. The right panel shows either a histogram or a density plot
depending on the choice of type
. Density plots avoid the
issue of arbitrary bin sizes that can sometimes provide a
misleading view of the data distribution. Density plots are often
more appropriate, but their effectiveness will depend on the data
in question. summaryPlot
will only show data that are numeric or integer
type. This is useful for checking that data have been imported
properly. For example, if for some reason a column representing
wind speed erroneosly had one or more fields with charcters in,
the whole column would be either character or factor type. The
absence of a wind speed variable in the summaryPlot
plot
would therefore indicate a problem with the input data. In this
particular case, the user should go back to the source data and
remove the characters or remove them using R functions. If there is a field site
, which would generally mean there
is more than one site, summaryPlot
will provide information
on a single pollutant across all sites, rather than provide
details on all pollutants at a single site. In this case
the user should also provide a name of a pollutant e.g.
pollutant = "nox"
. If a pollutant is not provided the first
numeric field will automatically be chosen. It is strongly recommended that the summaryPlot
function is applied to all new imported data sets to ensure the
data are imported as expected.
# load example data from package
data(mydata)
# do not clip density plot data
## Not run: summaryPlot(mydata, clip = FALSE)
# exclude highest 5 % of data etc.
## Not run: summaryPlot(mydata, percentile = 0.95)
# show missing data where there are at least 96 contiguous missing
# values (4 days)
## Not run: summaryPlot(mydata, na.len = 96)
# show data in green
## Not run: summaryPlot(mydata, col.data = "green")
# show missing data in yellow
## Not run: summaryPlot(mydata, col.mis = "yellow")
# show density plot line in black
## Not run: summaryPlot(mydata, col.dens = "black")
Run the code above in your browser using DataLab