Learn R Programming

openair (version 1.0)

conditionalEval: Conditional quantile estimates with additional variables for model evaluation

Description

This function enhances conditionalQuantile by also considering how other variables vary over the same intervals. Conditional quantiles are very useful on their own for model evaluation, but provide no direct information on how other variables change at the same time. For example, a conditional quantile plot of ozone concentrations may show that low concentrations of ozone tend to be under-predicted. However, the cause of the under-prediction can be difficult to determine. However, by considering how well the model predicts other variables over the same intervals, more insight can be gained into the underlying reasons why model performance is poor.

Usage

conditionalEval(mydata, obs = "obs", mod = "mod", var.obs = "var.obs",
  var.mod = "var.mod", type = "default", bins = 31, statistic = "MB",
  xlab = "predicted value", ylab = "statistic", col = brewer.pal(5,
  "YlOrRd"), col.var = "Set1", var.names = NULL, auto.text = TRUE, ...)

Arguments

mydata
A data frame containing the field obs and mod representing observed and modelled values.
obs
The name of the observations in mydata.
mod
The name of the predictions (modelled values) in mydata.
var.obs
Other variable observations for which statistics should be calculated. Can be more than length one e.g. var.obs = c("nox.obs", "ws.obs").
var.mod
Other variable predictions for which statistics should be calculated. Can be more than length one e.g. var.obs = c("nox.obs", "ws.obs").
type
type determines how the data are split i.e. conditioned, and then plotted. The default is will produce a single plot using the entire data. Type can be one of the built-in types as detailed in cutData e.g. "season", "year", "week
bins
Number of bins used in conditionalQuantile.
statistic
Statistic(s) to be plotted. Can be MB, NMB, r, COE, MGE, NMGE, RMSE and FAC2, as described in modStats. Wh
xlab
label for the x-axis, by default "predicted value".
ylab
label for the y-axis, by default "observed value".
col
Colours to be used for plotting the uncertainty bands and median line. Must be of length 5 or more.
col.var
Colours for the additional variables to be compared. See openColours for more details.
var.names
Variable names to be shown on plot for plotting var.obs and var.mod.
auto.text
Either TRUE (default) or FALSE. If TRUE titles and axis labels etc. will automatically try and format pollutant names and units properly e.g. by subscripting the `2' in NO2.
...
Other graphical parameters passed onto conditionalQuantile and cutData. For example, conditionalQuantile passes the option hemisphere = "southern" on to cutData to provide southern (rather t

Details

The conditionalEval function provides information on how other variables vary across the same intervals as shown on the conditional quantile plot. There are two types of variable that can be considered by setting the value of statistic. First, statistic can be another variable in the data frame. In this case the plot will show the different proportions of statistic across the range of predictions. For example statistic = "season" will show for each interval of mod the proportion of predictions that were spring, summer, autumn or winter. This is useful because if model performance is worse for example at high concentrations of mod then knowing that these tend to occur during a particular season etc. can be very helpful when trying to understand why a model fails. See cutData for more details on the types of variable that can be statistic. Another example would be statistic = "ws" (if wind speed were available in the data frame), which would then split wind speed into four quantiles and plot the proportions of each.

Second, conditionalEval can simultaneously plot the model performance of other observed/predicted variable pairs according to different model evaluation statistics. These statistics derive from the modStats function and include MB, NMB, r, COE, MGE, NMGE, RMSE and FAC2. More than one statistic can be supplied e.g. statistic = c("NMB", "COE"). Bootstrap samples are taken from the corresponding values of other variables to be plotted and their statistics with 95% confidence intervals calculated. In this case, the model performance of other variables is shown across the same intervals of mod, rather than just the values of single variables. In this second case the model would need to provide observed/predicted pairs of other variables.

For example, a model may provide predictions of NOx and wind speed (for which there are also observations available). The conditionalEval function will show how well these other variables are predicted for the same intervals of the main variables assessed in the conditional quantile e.g. ozone. In this case, values are supplied to var.obs (observed values for other variables) and var.mod (modelled values for other variables). For example, to consider how well the model predicts NOx and wind speed var.obs = c("nox.obs", "ws.obs") and var.mod = c("nox.mod", "ws.mod") would be supplied (assuming nox.obs, nox.mod, ws.obs, ws.mod are present in the data frame). The analysis could show for example, when ozone concentrations are under-predicted, the model may also be shown to over-predict concentrations of NOx at the same time, or under-predict wind speeds. Such information can thus help identify the underlying causes of poor model performance. For example, an under-prediction in wind speed could result in higher surface NOx concentrations and lower ozone concentrations. Similarly if wind speed predictions were good and NOx was over predicted it might suggest an over-estimate of NOx emissions. One or more additional variables can be plotted.

A special case is statistic = "cluster". In this case a data frame is provided that contains the cluster calculated by trajCluster and importTraj. Alternatively users could supply their own pre-calculated clusters. These calculations can be very useful in showing whether certain back trajectory clusters are associated with poor (or good) model performance. Note that in the case of statistic = "cluster" there will be fewer data points used in the analysis compared with the ordinary statistics above because the trajectories are available for every three hours. Also note that statistic = "cluster" cannot be used together with the ordinary model evaluation statistics such as MB. The output will be a bar chart showing the proportion of each interval of mod by cluster number.

Far more insight can be gained into model performance through conditioning using type. For example, type = "season" will plot conditional quantiles and the associated model performance statistics of other variables by each season. type can also be a factor or character field e.g. representing different models used.

See Wilks (2005) for more details of conditional quantile plots.

References

Wilks, D. S., 2005. Statistical Methods in the Atmospheric Sciences, Volume 91, Second Edition (International Geophysics), 2nd Edition. Academic Press.

See Also

See conditionalQuantile for information on conditional quantiles, modStats for model evaluation statistics and the package verification for comprehensive functions for forecast verification.

Examples

Run this code
## Examples to follow, or will be shown in the openair manual

Run the code above in your browser using DataLab