conditionalQuantile
by also
considering how other variables vary over the same
intervals. Conditional quantiles are very useful on their own for
model evaluation, but provide no direct information on how other
variables change at the same time. For example, a conditional
quantile plot of ozone concentrations may show that low
concentrations of ozone tend to be under-predicted. However, the
cause of the under-prediction can be difficult to
determine. However, by considering how well the model predicts
other variables over the same intervals, more insight can be
gained into the underlying reasons why model performance is poor.conditionalEval(mydata, obs = "obs", mod = "mod", var.obs = "var.obs",
var.mod = "var.mod", type = "default", bins = 31, statistic = "MB",
xlab = "predicted value", ylab = "statistic", col = brewer.pal(5,
"YlOrRd"), col.var = "Set1", var.names = NULL, auto.text = TRUE, ...)
obs
and mod
representing observed and modelled values.mydata
.mydata
.var.obs = c("nox.obs", "ws.obs")
.var.obs = c("nox.obs", "ws.obs")
.type
determines how the data are split
i.e. conditioned, and then plotted. The default is will produce a
single plot using the entire data. Type can be one of the built-in
types as detailed in cutData
e.g. "season", "year",
"weekconditionalQuantile
.modStats
. Wh"predicted value"
."observed value"
.openColours
for more details.var.obs
and var.mod
.TRUE
(default) or FALSE
. If
TRUE
titles and axis labels etc. will automatically try and format
pollutant names and units properly e.g. by subscripting the `2' in NO2.conditionalQuantile
and cutData
. For example,
conditionalQuantile
passes the option hemisphere =
"southern"
on to cutData
to provide southern (rather tconditionalEval
function provides information on how
other variables vary across the same intervals as shown on the
conditional quantile plot. There are two types of variable that
can be considered by setting the value of statistic
. First,
statistic
can be another variable in the data frame. In
this case the plot will show the different proportions of
statistic
across the range of predictions. For example
statistic = "season"
will show for each interval of
mod
the proportion of predictions that were spring, summer,
autumn or winter. This is useful because if model performance is
worse for example at high concentrations of mod
then
knowing that these tend to occur during a particular season
etc. can be very helpful when trying to understand why a
model fails. See cutData
for more details on the
types of variable that can be statistic
. Another example
would be statistic = "ws"
(if wind speed were available in
the data frame), which would then split wind speed into four
quantiles and plot the proportions of each.Second, conditionalEval
can simultaneously plot the model
performance of other observed/predicted variable pairs
according to different model evaluation statistics. These
statistics derive from the modStats
function and
include statistic = c("NMB",
"COE")
. Bootstrap samples are taken from the corresponding values
of other variables to be plotted and their statistics with 95%
confidence intervals calculated. In this case, the model
performance of other variables is shown across the same
intervals of mod
, rather than just the values of single
variables. In this second case the model would need to provide
observed/predicted pairs of other variables.
For example, a model may provide predictions of NOx and wind speed
(for which there are also observations available). The
conditionalEval
function will show how well these other
variables are predicted for the same intervals of the main
variables assessed in the conditional quantile e.g. ozone. In this
case, values are supplied to var.obs
(observed values for
other variables) and var.mod
(modelled values for other
variables). For example, to consider how well the model predicts
NOx and wind speed var.obs = c("nox.obs", "ws.obs")
and
var.mod = c("nox.mod", "ws.mod")
would be supplied
(assuming nox.obs, nox.mod, ws.obs, ws.mod
are present in
the data frame). The analysis could show for example, when ozone
concentrations are under-predicted, the model may also be shown to
over-predict concentrations of NOx at the same time, or
under-predict wind speeds. Such information can thus help identify
the underlying causes of poor model performance. For example, an
under-prediction in wind speed could result in higher surface NOx
concentrations and lower ozone concentrations. Similarly if wind
speed predictions were good and NOx was over predicted it might
suggest an over-estimate of NOx emissions. One or more additional
variables can be plotted.
A special case is statistic = "cluster"
. In this case a
data frame is provided that contains the cluster calculated by
trajCluster
and
importTraj
. Alternatively users could supply their
own pre-calculated clusters. These calculations can be very useful
in showing whether certain back trajectory clusters are associated
with poor (or good) model performance. Note that in the case of
statistic = "cluster"
there will be fewer data points used
in the analysis compared with the ordinary statistics above
because the trajectories are available for every three hours. Also
note that statistic = "cluster"
cannot be used together
with the ordinary model evaluation statistics such as MB. The
output will be a bar chart showing the proportion of each interval
of mod
by cluster number.
Far more insight can be gained into model performance through
conditioning using type
. For example, type =
"season"
will plot conditional quantiles and the associated model
performance statistics of other variables by each
season. type
can also be a factor or character field
e.g. representing different models used.
See Wilks (2005) for more details of conditional quantile plots.
conditionalQuantile
for information on conditional
quantiles, modStats
for model evaluation statistics
and the package verification
for comprehensive functions
for forecast verification.## Examples to follow, or will be shown in the openair manual
Run the code above in your browser using DataLab