accuracy: Compute accuracy and precision

Description

Computes goodness-of-fit measures (accuracy, precision and joint goodness) adapted or extended from the definition of Deutsch (1997).

Usage

accuracy(x, ...)
# S3 method for data.frame
accuracy(
  x,
  observed = x$observed,
  prob = seq(from = 0, to = 1, by = 0.05),
  method = "kriging",
  outMahalanobis = FALSE,
  ...
)
# S3 method for DataFrameStack
accuracy(
  x,
  observed,
  vars = intersect(colnames(observed), dimnames(x)[[noStackDim(x)]]),
  prob = seq(from = 0, to = 1, by = 0.05),
  method = ifelse(length(vars) == 1, "simulation", "Mahalanobis"),
  outMahalanobis = FALSE,
  ...
)

Arguments

data container for the predictions (plus cokriging error variances/covariance) or simulations (and eventually for the true values in univariate problems)

...

generic functionality, currently ignored

observed

either a vector- or matrix-like object of the true values

prob

sequence of cutoff probabilities to use for the calculations

method

which method was used for generating the predictions/simulations? one of c("kriging", "cokriging", "simulation") for x of class "data.frame", or of c("simulation", "mahalanobis", "flow") for x of class DataFrameStack().

outMahalanobis

if TRUE, do not do the final accuracy calculations and return the Mahalanobis norms of the residuals; if FALSE do the accuracy calculations

vars

in multivariate cases, a vector of names of the variables to analyse

Value

If outMahalanobis=TRUE (the primary use), this function returns a two-column dataset of class c("accuracy", "data.frame"), which first column gives the nominal probability cutoffs used, and the second column the actual coverage of the intervals of each of these probabilities. If outMahalanobis=FALSE, the output is a vector (for prediction) or matrix (for simulation) of Mahalanobis error norms.

Methods (by class)

data.frame: Compute accuracy and precision
DataFrameStack: Compute accuracy and precision

Details

For method "kriging", x must contain columns with names including the string "pred" for predictions and "var" for the kriging variance; the observed values can also be included as an extra column with name "observed", or else additionally provided in argument observed. For method "cokriging", the columns of x must contain predictions, cokriging variances and cokriging covariances in columns including the strings "pred", "var" resp. "cov", and observed values can only be provided via observed argument. Note that these are the natural formats when using gstat::predict.gstat() and other (co)kriging functions of that package.

For univariate and multivariate cokriging results (methods "kriging" and "cokriging"), the coverage values are computed based on the Mahalanobis square error, the (square) distance between prediction and true value, using as the positive definite bilinear form of the distance the variance-covariance cokriging matrix. The rationale is that, under the assumption that the random field is Gaussian, the distribution of this Mahalanobis square error is known to follow a \(\chi^2(\nu)\) with degrees of freedom \(\nu\) equal to the number of variables. Having this reference distribution allows us to compute confidence intervals for that Mahalanobis square error, and then count how many of the actually observed errors are included on each one of the intervals (the coverage). For a perfect adjustment to the distribution, the plot of coverage vs. nominal confidence (see plot.accuracy) should fall on the \(y=x\) line. NOTE: the original definition of Deutsch (1997) for univariate case did not make use of the \(\chi^2(1)\) distribution, but instead derived the desired intervals (symmetric!) from the standard normal distribution appearing by normalizing the residual with the kriging variance; the result is the same.

For method "simulation" and object x is a data.frame, the variable names containing the realisations must contain the string "sim", and observed must be a vector with as many elements as rows has x. If x is a DataFrameStack(), then it is assumed that the stacking dimension is running through the realisations; the true values must still be given in observed. In both cases, the method is based on ranks: with them we can calculate which is the frequency of simulations being more extreme than the observed value. This calculation is done considering bilateral intervals around the median of (realisations, observed value) for each location separately.

Method "mahalanobis" ("Mahalanobis" also works) is the analogous for multivariate simulations. It only works for x of class DataFrameStack(), and requires the stacking dimension to run through the realisations and the other two dimensions to coincide with the dimensions of observed, i.e. giving locations by rows and variables by columns. In this case, a covariance matrix will be computed and this will be used to compute the Mahalanobis square error defined above in method "cokriging": this Mahalanobis square error will be computed for each simulation and for the true value. The simulated Mahalanobis square errors will then be used to generate the reference distribution with which to derive confidence intervals.

Finally, highly experimental "flow" method requires the input to be in the same shape as method "mahalanobis". The method is mostly the same, just that before the Mahalanobis square errors are computed a location-wise flow anamorphosis (ana()) is applied to transform the realisations (including the true value as one of them) to joint normality. The rest of the calculations are done as if with method "mahalanobis".