summaryPlot
can also provide
summaries of a single pollutant across many sites.summaryPlot(mydata, na.len = 24, clip = TRUE, percentile = 0.99,
type = "histogram", pollutant = "nox", period = "years",
breaks = NULL, col.trend = "darkgoldenrod2", col.data = "lightblue",
col.mis = rgb(0.65, 0.04, 0.07), col.hist = "forestgreen", cols = NULL,
date.breaks = 7, auto.text = TRUE, ...)
date
field and at least one other parameter.na.len
contiguous missing vales. The
purpose of setting na.len
is for clarity: with
long time series it is difficult to see where individual
missing hours are. Furthermorclip = TRUE
, will
remove the top 1 better display of the overall distribution of the data.
The amount of clpercentile = 0.99
(the default) will
remove the top 1 percentile of values i.e. values greater
than the 99th percentile will not be used.type
is used to determine whether a
histogram (the default) or a density plot is used to show
the distribution of the data.pollutant
is used when there is a
field site
and there is more than one site in the
data frame.period
is either year
(the
default) or month
. Statistics are calculated
depending on the period
chosen.colors()
into R to see the full range of colour
names.colors()
into R to
see the full range of colour names."greyscale"
.TRUE
(default) or
FALSE
. If TRUE
titles and axis labels will
automatically try and format pollutant names and units
properly e.g. by subscripting the xlab
, ylab
and main
), which
are all passed to the plot via quickText
to handle
routine forsummaryPlot
produces two panels of plots: one
showing the presence/absence of data and the other the
distributions. The left panel shows time series and codes
the presence or absence of data in different colours. By
stacking the plots one on top of another it is easy to
compare different pollutants/variables. Overall statistics
are given for each variable: mean, maximum, minimum,
missing hours (also expressed as a percentage), median and
the 95th percentile. For each year the data capture rate
(expressed as a percentage of hours in that year) is also
given.The right panel shows either a histogram or a density plot
depending on the choice of type
. Density plots avoid
the issue of arbitrary bin sizes that can sometimes provide
a misleading view of the data distribution. Density plots
are often more appropriate, but their effectiveness will
depend on the data in question.
summaryPlot
will only show data that are numeric or
integer type. This is useful for checking that data have
been imported properly. For example, if for some reason a
column representing wind speed erroneosly had one or more
fields with charcters in, the whole column would be either
character or factor type. The absence of a wind speed
variable in the summaryPlot
plot would therefore
indicate a problem with the input data. In this particular
case, the user should go back to the source data and remove
the characters or remove them using R functions.
If there is a field site
, which would generally mean
there is more than one site, summaryPlot
will
provide information on a single pollutant across all
sites, rather than provide details on all pollutants at a
single site. In this case the user should also
provide a name of a pollutant e.g. pollutant =
"nox"
. If a pollutant is not provided the first numeric
field will automatically be chosen.
It is strongly recommended that the
summaryPlot
function is applied to all new imported
data sets to ensure the data are imported as expected.
# load example data from package
data(mydata)
# do not clip density plot data
summaryPlot(mydata, clip = FALSE)
# exclude highest 5 \% of data etc.
summaryPlot(mydata, percentile = 0.95)
# show missing data where there are at least 96 contiguous missing
# values (4 days)
summaryPlot(mydata, na.len = 96)
# show data in green
summaryPlot(mydata, col.data = "green")
# show missing data in yellow
summaryPlot(mydata, col.mis = "yellow")
# show density plot line in black
summaryPlot(mydata, col.dens = "black")
Run the code above in your browser using DataLab