checkGMU: Evaluation of geostatistical models of uncertainty

Description

Evaluate the local quality of a geostatistical model of uncertainty (GMU) using summary measures and graphical displays.

Usage

checkGMU(observed, simulated, pi = seq(0.01, 0.99, 0.01), symmetric = TRUE,
  plotit = TRUE)

Arguments

observed

Vector of observed values at the validation points. See Details for more information.

simulated

Data frame or matrix with simulated values (columns) for each validation point (rows). See Details for more information.

Vector defining the width of the series of probability intervals. Defaults to pi = seq(0.01, 0.99, 0.01). See Details for more information.

symmetric

Logical for choosing the type of probability interval. Defaults to symmetric = TRUE. See Details for more information.

plotit

Logical for plotting the results. Defaults to plotit = TRUE.

Details

There is no standard way of evaluating the local quality of a GMU. The collection of summary measures and graphical displays presented here is far from being comprehensive. A few definitions are given bellow.

Error statistics{ Error statistics measure how well the GMU predicts the measured values at the validation points. Four error statistics are presented:

[object Object],[object Object],[object Object],[object Object] } Coverage probabilities{ The coverage probability of an interval is given by the number of times that that interval contains its parameter over several replications of an experiment. For example, consider the interquartile range $IQR = Q3 - Q1$ of a Gaussian distributed variable with mean equal to zero and variance equal to one. The nominal coverage probability of the IQR is 0.5, i.e. two quarters of the data fall within the IQR. Suppose we generate a Gaussian distributed random variable with the same mean and variance and count the number of values that fall within the IQR defined above: about 0.5 of its values will fall within the IQR. If we continue generating Gaussian distributed random variables with the same mean and variance, on average, 0.5 of the values will fall in that interval.

Coverage probabilities are very useful to evaluate the local quality of a GMU: the closer the observed coverage probabilities of a sequence of probability intervals (PI) are to the nominal coverage probabilities of those PIs, the better the modelling of the local uncertainty.

Two types of PIs can be used here: symmetric, median-centred PIs, and left-bounded PIs. Papritz & Dubois (1999) recommend using left-bounded PIs because they are better at evidencing deviations for both large and small PIs. The authors also point that the coverage probabilities of the symmetric, median-centred PIs can be read from the coverage probability plots produced using left-bounded PIs.

In both cases, the PIs are computed at each validation location using the quantiles of the conditional cumulative distribution function (ccdf) defined by the set of realizations at that validation location. For a sequence of PIs of increasing width, we check which of them contains the observed value at all validation locations. We then average the results over all validation locations to compute the proportion of PIs (with the same width) that contains the observed value: this gives the coverage probability of the PIs.

Deutsch (1997) proposed three summary measures of the coverage probabilities to assess the local goodness of a GMU: accuracy ($A$), precision ($P$), and goodness ($G$). According to Deutsch (1997), a GMU can be considered good if it is both accurate and precise. Although easy to compute, these measures seem not to have been explored by many geostatisticians, except for the studies developed by Pierre Goovaerts and his later software implementation (Goovaerts, 2009). Richmond (2001) suggests that they should not be used as the only measures of the local quality of a GMU.

[object Object],[object Object],[object Object] It is worth noting that the coverage probability and PI-width plots are relevant mainly to GMU created using conditional simulations, that is, simulations that are locally conditioned to the data observed at the validation locations. Conditioning the simulations locally serves the purposes of honouring the available data and reducing the variance of the output realizations. This is why one would like to find the points falling above the 1:1 line in both coverage probability and PI-width plots. For unconditional simulations, that is, simulations that are only globally conditioned to the histogram (and variogram) of the data observed at the validation locations, one would expect to find that, over a large number of simulations, the whole set of possible values (i.e. the global histogram) can be generated at any node of the simulation grid. In other words, it is expected to find all points on the 1:1 line in both coverage probability and PI-width plots. Deviations from the 1:1 line could then be used as evidence of problems in the simulation. }

References

Deutsch, C. Direct assessment of local accuracy and precision. Baafi, E. Y. & Schofield, N. A. (Eds.) Geostatistics Wollongong '96. Dordrecht: Kinwer Academic Publishers, v. I, p. 115-125, 1997.

Papritz, A. & Dubois, J. R. Mapping heavy metals in soil by (non-)linear kriging: an empirical validation. Gómez-Hernández, J.; Soares, A. & Froidevaux, R. (Eds.) geoENV II -- Geostatistics for Environmental Applications. Springer, p. 429-440, 1999.

Goovaerts, P. Geostatistical modelling of uncertainty in soil science. Geoderma. v. 103, p. 3 - 26, 2001.

Goovaerts, P. AUTO-IK: a 2D indicator kriging program for the automated non-parametric modeling of local uncertainty in earth sciences. Computers & Geosciences. v. 35, p. 1255-1270, 2009.

Richmond, A. J. Maximum profitability with minimum risk and effort. Xie, H.; Wang, Y. & Jiang, Y. (Eds.) Proceedings 29th APCOM. Lisse: A. A. Balkema, p. 45-50, 2001.

Ripley, B. D. Stochastic simulation. New York: John Wiley & Sons, p. 237, 1987.

Examples

Run this code

set.seed(2001)
observed <- round(rnorm(100), 3)
simulated <- t(
  sapply(1:length(observed), function (i) round(rnorm(100), 3)))
resa <- checkGMU(observed, simulated, symmetric = T)
resb <- checkGMU(observed, simulated, symmetric = F)
resa$error;resb$error
resa$goodness;resb$goodness

Run the code above in your browser using DataLab