conditionalQuantile
by
also considering how other variables vary over the same
intervals. Conditional quantiles are very useful on their
own for model evaluation, but provide no direct information
on how other variables change at the same time. For
example, a conditional quantile plot of ozone
concentrations may show that low concentrations of ozone
tend to be under-predicted. However, the cause of the
under-prediction can be difficult to determine. However, by
considering how well the model predicts other variables
over the same intervals, more insight can be gained into
the underlying reasons why model performance is poor.conditionalEval(mydata, obs = "obs", mod = "mod", var.obs = "var.obs",
var.mod = "var.mod", type = "default", bins = 31, statistic = "MB",
xlab = "predicted value", ylab = "statistic", col = brewer.pal(5,
"YlOrRd"), col.var = "Set1", var.names = NULL, auto.text = TRUE, ...)
obs
and mod
representing observed and
modelled values.mydata
.mydata
.var.obs = c("nox.obs", "ws.obs")
.var.obs = c("nox.obs", "ws.obs")
.type
determines how the data are split
i.e. conditioned, and then plotted. The default is will
produce a single plot using the entire data. Type can be
one of the built-in types as detailed in cutData
e.g. "season", "yearconditionalQuantile
.modStats
"predicted value"
."observed value"
.openColours
for more details.var.obs
and var.mod
.TRUE
(default) or
FALSE
. If TRUE
titles and axis labels etc.
will automatically try and format pollutant names and
units properly e.g. by subscripting the `2' in NO2.conditionalQuantile
and cutData
. For
example, conditionalQuantile
passes the option
hemisphere = "southern"
on to cutData
to
provide southern (conditionalEval
function provides information on
how other variables vary across the same intervals as shown
on the conditional quantile plot. There are two types of
variable that can be considered by setting the value of
statistic
. First, statistic
can be another
variable in the data frame. In this case the plot will show
the different proportions of statistic
across the
range of predictions. For example statistic =
"season"
will show for each interval of mod
the
proportion of predictions that were spring, summer, autumn
or winter. This is useful because if model performance is
worse for example at high concentrations of mod
then
knowing that these tend to occur during a particular season
etc. can be very helpful when trying to understand
why a model fails. See cutData
for
more details on the types of variable that can be
statistic
. Another example would be statistic
= "ws"
(if wind speed were available in the data frame),
which would then split wind speed into four quantiles and
plot the proportions of each.Second, conditionalEval
can simultaneously plot the
model performance of other observed/predicted variable
pairs according to different model evaluation
statistics. These statistics derive from the
modStats
function and include statistic =
c("NMB", "COE")
. Bootstrap samples are taken from the
corresponding values of other variables to be plotted and
their statistics with 95% confidence intervals calculated.
In this case, the model performance of other
variables is shown across the same intervals of mod
,
rather than just the values of single variables. In this
second case the model would need to provide
observed/predicted pairs of other variables.
For example, a model may provide predictions of NOx and
wind speed (for which there are also observations
available). The conditionalEval
function will show
how well these other variables are predicted for the same
intervals of the main variables assessed in the conditional
quantile e.g. ozone. In this case, values are supplied to
var.obs
(observed values for other variables) and
var.mod
(modelled values for other variables). For
example, to consider how well the model predicts NOx and
wind speed var.obs = c("nox.obs", "ws.obs")
and
var.mod = c("nox.mod", "ws.mod")
would be supplied
(assuming nox.obs, nox.mod, ws.obs, ws.mod
are
present in the data frame). The analysis could show for
example, when ozone concentrations are under-predicted, the
model may also be shown to over-predict concentrations of
NOx at the same time, or under-predict wind speeds. Such
information can thus help identify the underlying causes of
poor model performance. For example, an under-prediction in
wind speed could result in higher surface NOx
concentrations and lower ozone concentrations. Similarly if
wind speed predictions were good and NOx was over predicted
it might suggest an over-estimate of NOx emissions. One or
more additional variables can be plotted.
A special case is statistic = "cluster"
. In this
case a data frame is provided that contains the cluster
calculated by trajCluster
and
importTraj
. Alternatively users could supply
their own pre-calculated clusters. These calculations can
be very useful in showing whether certain back trajectory
clusters are associated with poor (or good) model
performance. Note that in the case of statistic =
"cluster"
there will be fewer data points used in the
analysis compared with the ordinary statistics above
because the trajectories are available for every three
hours. Also note that statistic = "cluster"
cannot
be used together with the ordinary model evaluation
statistics such as MB. The output will be a bar chart
showing the proportion of each interval of mod
by
cluster number.
Far more insight can be gained into model performance
through conditioning using type
. For example,
type = "season"
will plot conditional quantiles and
the associated model performance statistics of other
variables by each season. type
can also be a factor
or character field e.g. representing different models used.
See Wilks (2005) for more details of conditional quantile plots.
conditionalQuantile
for information on
conditional quantiles, modStats
for model
evaluation statistics and the package verification
for comprehensive functions for forecast verification.## Examples to follow, or will be shown in the openair manual
Run the code above in your browser using DataLab