summaryPlot can also provide
summaries of a single pollutant across many sites.summaryPlot(mydata, na.len = 24, clip = TRUE, percentile = 0.99,
type = "histogram", pollutant = "nox", period = "years",
breaks = NULL, col.trend = "darkgoldenrod2", col.data = "lightblue",
col.mis = rgb(0.65, 0.04, 0.07), col.hist = "forestgreen", cols = NULL,
date.breaks = 7, auto.text = TRUE, ...)date field and at least one other parameter.na.len contiguous missing vales. The
purpose of setting na.len is for clarity: with
long time series it is difficult to see where individual
missing hours are. Furthermorclip = TRUE, will
remove the top 1 better display of the overall distribution of the data.
The amount of clpercentile = 0.99 (the default) will
remove the top 1 percentile of values i.e. values greater
than the 99th percentile will not be used.type is used to determine whether a
histogram (the default) or a density plot is used to show
the distribution of the data.pollutant is used when there is a
field site and there is more than one site in the
data frame.period is either year (the
default) or month. Statistics are calculated
depending on the period chosen.colors() into R to see the full range of colour
names.colors() into R to
see the full range of colour names."greyscale".TRUE (default) or
FALSE. If TRUE titles and axis labels will
automatically try and format pollutant names and units
properly e.g. by subscripting the xlab, ylab and main), which
are all passed to the plot via quickText to handle
routine forsummaryPlot produces two panels of plots: one
showing the presence/absence of data and the other the
distributions. The left panel shows time series and codes
the presence or absence of data in different colours. By
stacking the plots one on top of another it is easy to
compare different pollutants/variables. Overall statistics
are given for each variable: mean, maximum, minimum,
missing hours (also expressed as a percentage), median and
the 95th percentile. For each year the data capture rate
(expressed as a percentage of hours in that year) is also
given.The right panel shows either a histogram or a density plot
depending on the choice of type. Density plots avoid
the issue of arbitrary bin sizes that can sometimes provide
a misleading view of the data distribution. Density plots
are often more appropriate, but their effectiveness will
depend on the data in question.
summaryPlot will only show data that are numeric or
integer type. This is useful for checking that data have
been imported properly. For example, if for some reason a
column representing wind speed erroneosly had one or more
fields with charcters in, the whole column would be either
character or factor type. The absence of a wind speed
variable in the summaryPlot plot would therefore
indicate a problem with the input data. In this particular
case, the user should go back to the source data and remove
the characters or remove them using R functions.
If there is a field site, which would generally mean
there is more than one site, summaryPlot will
provide information on a single pollutant across all
sites, rather than provide details on all pollutants at a
single site. In this case the user should also
provide a name of a pollutant e.g. pollutant =
"nox". If a pollutant is not provided the first numeric
field will automatically be chosen.
It is strongly recommended that the
summaryPlot function is applied to all new imported
data sets to ensure the data are imported as expected.
# load example data from package
data(mydata)
# do not clip density plot data
summaryPlot(mydata, clip = FALSE)
# exclude highest 5 \% of data etc.
summaryPlot(mydata, percentile = 0.95)
# show missing data where there are at least 96 contiguous missing
# values (4 days)
summaryPlot(mydata, na.len = 96)
# show data in green
summaryPlot(mydata, col.data = "green")
# show missing data in yellow
summaryPlot(mydata, col.mis = "yellow")
# show density plot line in black
summaryPlot(mydata, col.dens = "black")Run the code above in your browser using DataLab