validate.predictions: Summary Statistics of (Cross-)Validation Prediction Errors

Description

Functions to compute and plot summary statistics of prediction errors to (cross-)validate fitted spatial linear models by the criteria proposed by Gneiting et al. (2007) for assessing probabilistic forecasts.

Usage

validate.predictions(data, pred, se.pred,
    statistic = c("crps", "pit", "mc", "bs", "st"), ncutoff = NULL)
# S3 method for cv.georob
plot(x,
    type = c("sc", "lgn.sc", "ta", "qq", "hist.pit", "ecdf.pit", "mc", "bs"),
    smooth = TRUE, span = 2/3, ncutoff = NULL, add = FALSE,
    col, pch, lty, main, xlab, ylab, ...)
# S3 method for cv.georob
print(x, digits = max(3, getOption("digits") - 3), ...)
# S3 method for cv.georob
summary(object, se = FALSE, ...)

Value

Depending on the argument statistic, the function

validate.predictions returns

a numeric vector of PIT values if statistic is equal to "pit",

a named numeric vector with summary statistics of the (standardized) prediction errors if statistic is equal to "st". The following statistics are computed:

`me`	mean prediction error
`mede`	median prediction error
`rmse`	root mean squared prediction error
`made`	median absolute prediction error
`qne`	Qn dispersion measure of prediction errors (see `Qn`)
`msse`	mean squared standardized prediction error
`medsse`	median squared standardized prediction error

a data frame if statistic is equal to "mc" or "bs" with the components (see Details):

`z`	the sorted unique (cross-)validation observations (or a subset of size `ncutoff` of them)
`ghat`	the empirical CDF of the (cross-)validation observations $\widehat{G}_n(y)$
`fbar`	the average predictive distribution $\bar{F}_n(y)$
`bs`	the Brier score $\mathrm{BS}(y)$

The method print.cv.georob invisibly returns the object unchanged.

The method summary.cv.georob returns an object of class

summary.cv.georob which is a list with 3 components:

st a numeric vector with summary statistics of the (standardized) prediction errors of the possibly log-transformed response, see output of function validate.predictions for argument statistic = "st" above.
crps the value of the continuous ranked probability score.
st.lgn a numeric vector with summary statistics of the (standardized) prediction errors of the back-transformed response if argument lgn = TRUE and NULL otherwise.

There is a print method for objects of class summary.cv.georob

which invisibly returns the object unchanged.

The method plot.georob is called for its side effects and invisibly returns NULL.

Arguments

data

a numeric vector with observations about a response (mandatory argument).

pred

a numeric vector with predictions for the response (mandatory argument).

se.pred

a numeric vector with prediction standard errors (mandatory argument).

statistic

a character keyword defining what statistic of the prediction errors should be computed. Possible values are (see Details):

"crps": continuous ranked probability score (default),
"pit": probability integral transform,
"mc": average predictive distribution (marginal calibration),
"bs": Brier score,
"st": mean and dispersion statistics of (standardized) prediction errors.

ncutoff

a positive integer ($N$) giving the number of quantiles, for which CDFs are evaluated (type = "mc"), or the number of thresholds for which the Brier score is computed (type = "bs"), see Details (default: min(500, length(data))).

x, object

objects of class cv.georob.

digits

a positive integer indicating the number of decimal digits to print.

type

a character keyword defining what type of plot is created by the plot.cv.georob. Possible values are:

"sc": a scatter-plot of the (possibly log-transformed) response vs. the respective predictions (default).
"lgn.sc": a scatter-plot of the untransformed response against back-
transformed predictions of the log-transformed response.
"ta": Tukey-Anscombe plot (plot of standardized prediction errors vs. predictions).
"qq": normal QQ plot of standardized prediction errors.
"hist.pit": histogram of probability integral transform, see Details.
"ecdf.pit": empirical CDF of probability integral transform, see Details.
"mc": a marginal calibration plot, see Details,
"bs": a plot of Brier score vs. threshold, see Details.

smooth

a logical scalar controlling whether scatter plots of data vs. predictions should be smoothed by loess.smooth.

span

a numeric with the smoothness parameter for loess (see loess.smooth).

add

a logical scalar controlling whether the current high-level plot is added to an existing graphics without cleaning the frame before (default: FALSE).

main, xlab, ylab

title and axes labels of plot.

col, pch, lty

color, symbol and line type.

se

a logical scalar controlling if the standard errors of the averaged continuous ranked probability score and of the mean and dispersion statistics of the prediction errors (see Details) are computed from the respective values of the $K$ cross-validation subsets (default: FALSE).

...

additional arguments passed to the methods.

Author

Andreas Papritz papritz@retired.ethz.ch.

Details

validate.predictions computes the items required to evaluate (and plot) the diagnostic criteria proposed by Gneiting et al. (2007) for assessing the calibration and the sharpness of probabilistic predictions of (cross-)validation data. To this aim, validate.predictions uses the assumption that the prediction errors $Y(\boldsymbol{s})-\widehat{Y}(\boldsymbol{s})$ follow normal distributions with zero mean and standard deviations equal to the Kriging standard errors. This assumption is an approximation if the errors $\varepsilon$ come from a long-tailed distribution. Furthermore, for the time being, the Kriging variance of the response $Y$ is approximated by adding the estimated nugget $\widehat{\tau}^2$ to the Kriging variance of the signal $Z$. This approximation likely underestimates the mean squared prediction error of the response if the errors come from a long-tailed distribution. Hence, for robust Kriging, the standard errors of the (cross-)validation errors are likely too small.

Notwithstanding these difficulties and imperfections, validate.predictions computes

the probability integral transform (PIT),

$$\mathrm{PIT}_i = F_i(y_i),$$

where $F_i(y_i)$ denotes the (plug-in) predictive CDF evaluated at $y_i$, the value of the $i$th (cross-)validation datum,
the average predictive CDF (plug-in)

$$\bar{F}_n(y)=1/n \sum_{i=1}^n F_i(y),$$

where $n$ is the number of (cross-)validation observations and the $F_i$ are evaluated at $N$ quantiles equal to the set of distinct $y_i$ (or a subset of size $N$ of them),
the Brier Score (plug-in)

$$\mathrm{BS}(y) = 1/n \sum_{i=1}^n \left(F_i(y) - I(y_i \leq y) \right)^2,$$

where $I(x)$ is the indicator function for the event $x$, and the Brier score is again evaluated at the unique values of the (cross-)validation observations (or a subset of size $N$ of them),
the averaged continuous ranked probability score, CRPS, a strictly proper scoring criterion to rank predictions, which is related to the Brier score by

$$\mathrm{CRPS} = \int_{-\infty}^\infty \mathrm{BS}(y) \,dy.$$

Gneiting et al. (2007) proposed the following plots to validate probabilistic predictions:

A histogram (or a plot of the empirical CDF) of the PIT values. For ideal predictions, with observed coverages of prediction intervals matching nominal coverages, the PIT values have an uniform distribution.
Plots of $\bar{F}_n(y)$ and of the empirical CDF of the data, say $\widehat{G}_n(y)$, and of their difference, $\bar{F}_n(y)-\widehat{G}_n(y)$ vs $y$. The forecasts are said to be marginally calibrated if $\bar{F}_n(y)$ and $\widehat{G}_n(y)$ match.
A plot of $\mathrm{BS}(y)$ vs. $y$. Probabilistic predictions are said to be sharp if the area under this curve, which equals CRPS, is minimized.

The plot method for class cv.georob allows to create these plots, along with scatter-plots of observations and predictions, Tukey-Anscombe and normal QQ plots of the standardized prediction errors.

summary.cv.georob computes the mean and dispersion statistics of the (standardized) prediction errors (by a call to validate.prediction with argument statistic = "st", see Value) and the averaged continuous ranked probability score (crps). If present in the cv.georob object, the error statistics are also computed for the errors of the unbiasedly back-transformed predictions of a log-transformed response. If se is TRUE then these statistics are evaluated separately for the $K$ cross-validation subsets and the standard errors of the means of these statistics are returned as well.

The print method for class cv.georob returns the mean and dispersion statistics of the (standardized) prediction errors.

References

Gneiting, T., Balabdaoui, F. and Raftery, A. E.(2007) Probabilistic forecasts, calibration and sharpness. Journal of the Royal Statistical Society Series B 69, 243--268,
tools:::Rd_expr_doi("10.1111/j.1467-9868.2007.00587.x").

Examples

Run this code

## define number of cores for parallel computations
if(interactive()) ncpu <- 10L else ncpu <- 1L

data(meuse)

r.logzn <- georob(log(zinc) ~ sqrt(dist), data = meuse, locations = ~ x + y,
  variogram.model = "RMexp",
  param = c(variance = 0.15, nugget = 0.05, scale = 200),
  tuning.psi = 1000)

if(interactive()){
  ## example is run only in interactive session because cpu times exceeds 5 s
  r.logzn.cv.1 <- cv(r.logzn, seed = 1, lgn = TRUE, ncores = 1, verbose = 1)

  r.logzn.cv.2 <- cv(r.logzn, formula = .~. + ffreq, seed = 1, lgn = TRUE,
      ncores = ncpu)

  summary(r.logzn.cv.1, se = TRUE)
  summary(r.logzn.cv.2, se = TRUE)

  op <- par(mfrow = c(2,2))
  plot(r.logzn.cv.1, type = "lgn.sc")
  plot(r.logzn.cv.2, type = "lgn.sc", add = TRUE, col = "red")
  abline(0, 1, lty= "dotted")
  plot(r.logzn.cv.1, type = "ta")
  plot(r.logzn.cv.2, type = "ta", add = TRUE, col = "red")
  abline(h=0, lty= "dotted")
  plot(r.logzn.cv.2, type = "mc", col = "red")
  plot(r.logzn.cv.1, type = "bs")
  plot(r.logzn.cv.2, type = "bs", add = TRUE, col = "red")
  legend("topright", lty = 1, col = c("black", "red"), bty = "n",
      legend = c("log(Zn) ~ sqrt(dist)", "log(Zn) ~ sqrt(dist) + ffreq"))
  par(op)
}

Run the code above in your browser using DataLab