georob
This function assesses the goodness-of-fit of a spatial linear model by
K-fold cross-validation. In more detail, the model is re-fitted
K times by robust (or Gaussian) (RE)ML, excluding each time
1/Kth of the data. The re-fitted models are used to compute robust
(or customary) external Kriging predictions for the omitted observations.
If the response variable is log-transformed then the Kriging predictions
can be optionally transformed back to the original scale of the
measurements. S3methods for evaluating and plotting diagnostic summaries
of the cross-validation errors are described for the function
validate.predictions
.
# S3 method for georob
cv(object, formula = NULL, subset = NULL,
method = c("block", "random"), nset = 10L, seed = NULL,
sets = NULL, duplicates.in.same.set = TRUE, re.estimate = TRUE,
param = object[["variogram.object"]][[1]][["param"]],
fit.param = object[["variogram.object"]][[1]][["fit.param"]],
aniso = object[["variogram.object"]][[1]][["aniso"]],
fit.aniso = object[["variogram.object"]][[1]][["fit.aniso"]],
variogram.object = NULL,
use.fitted.param = TRUE, return.fit = FALSE,
reduced.output = TRUE, lgn = FALSE,
mfl.action = c("offset", "stop"),
ncores = min(nset, parallel::detectCores()), verbose = 0, ...)
The method cv.georob
returns an object of class cv.georob
,
which is a list with the two
components pred
and fit
.
pred
is a data frame with the coordinates and the
cross-validation prediction results with the following variables:
an integer vector defining to which of the \(K\) subsets an observation was assigned.
the values of the (possibly log-transformed) response.
the Kriging predictions.
the Kriging standard errors.
If lgn = TRUE
then pred
has the additional variables:
the untransformed response.
the unbiased back-transformed predictions of a log-transformed response.
the Kriging standard errors of the back-transformed
predictions of a
log-transformed response.
The second component fit
contains either the full outputs of
georob
, fitted for the \(K\) reduced data sets
(reduced.output = FALSE
), or \(K\) lists with the components
tuning.psi
, converged
,
convergence.code
,
gradient
, variogram.object
, coefficients
along with
the standard errors of
\(\widehat{\boldsymbol{\beta}}\), see
georobObject
.
an object of class of "georob"
, see
georobObject
.
an optional formula for the regression model passed by
update
to georob
.
an optional vector specifying a subset of observations to be used in the fitting process.
a character keyword, controlling whether subsets are formed
by partitioning data set into contiguous spatial block
s by
kmeans
(default) or random
ly. Ignored if
sets
is non-NULL
.
a positive integer defining the number K of subsets into
which the data set is partitioned (default: nset = 10
). Ignored
if sets
is non-NULL
.
an optional integer seed to initialize random number generation,
see set.seed
. Ignored if sets
is non-NULL
.
an optional vector of the same length as the response vector
of the fitted model and with positive integers taking values in
\((1,2,\ldots,K)\), defining in this way the \(K\) subsets into which
the data set is split. If sets = NULL
(default) the partition is
randomly generated by kmeans
or
runif
(using possibly seed
).
a logical scalar controlling whether
replicated observations at a given location are assigned to the same
subset when partitioning the data (default TRUE
).
a logical scalar controlling whether the model is
re-fitted to the reduced data sets before computing the Kriging
predictions (TRUE
, default) or whether the model passed in
object
is used to compute the predictions for the omitted
observations, see Details.
a named numeric vector or a matrix or data frame with
initial values of variogram parameters passed by
update
to georob
. If param
is
a matrix (or a data frame) then it must have nset
rows and
length(object[["variogram.object"]][[1]][["param"]])
columns with
initial values of variogram parameters for the nset
cross-validation sets, and colnames(param)
must match
names(object[["variogram.object"]][[1]][["param"]])
.
a named logical vector or a matrix or data frame
defining which variogram parameters should be adjusted by
update
. If fit.param
is a matrix (or a data
frame) then it must have nset
rows and
length(object[["variogram.object"]][[1]][["fit.param"]])
columns
with variogram parameter fitting flags for the nset
cross-validation sets, and colnames(param)
must match
names(object[["variogram.object"]][[1]][["fit.param"]])
.
a named numeric vector or a matrix or data frame with
initial values of anisotropy parameters passed by
update
to georob
. If aniso
is
a matrix (or a data frame) then it must have nset
rows and
length(object[["variogram.object"]][[1]][["aniso"]])
columns with
initial values of anisotropy parameters for the nset
cross-validation sets, and colnames(aniso)
must match
names(object[["variogram.object"]][[1]][["aniso"]])
.
a named logical vector or a matrix or data frame
defining which anisotropy parameters should be adjusted by
update
. If fit.aniso
is a matrix (or a data
frame) then it must have nset
rows and
length(object[["variogram.object"]][[1]][["fit.aniso"]])
columns
with anisotropy parameter fitting flags for the nset
cross-validation sets, and colnames(param)
must match
names(object[["variogram.object"]][[1]][["fit.aniso"]])
.
an optional list that gives initial values for fitting a possibly nested variogram model for the cross-validation sets. Each component is a list with the following components:
param
: an optional named numeric vector or a matrix or
data frame with initial values of variogram parameters passed by
update
to georob
. If param
is a matrix (or a data frame) then it must have nset
rows
and
length(object[["variogram.object"]][[i]][["param"]])
columns with initial values of variogram parameters for the
nset
cross-validation sets (i is the ith variogram
structure), and colnames(param)
must match
names(object[["variogram.object"]][[i]][["param"]])
.
fit.param
: an optional named logical vector or a matrix
or data frame defining which variogram parameters should be adjusted
by update
. If fit.param
is a matrix (or
a data frame) then it must have nset
rows and
length(object[["variogram.object"]][[i]][["fit.param"]])
columns with variogram parameter fitting flags for the nset
cross-validation sets (i is the ith variogram structure),
and colnames(param)
must match
names(object[["variogram.object"]][[i]][["fit.param"]])
.
aniso
: an optional named numeric vector or a matrix or
data frame with initial values of anisotropy parameters passed by
update
to georob
. If aniso
is a matrix (or a data frame) then it must have nset
rows
and
length(object[["variogram.object"]][[i]][["aniso"]])
columns with initial values of anisotropy parameters for the
nset
cross-validation sets (i is the ith variogram
structure), and colnames(aniso)
must match
names(object[["variogram.object"]][[i]][["aniso"]])
.
fit.aniso
: an optional named logical vector or a matrix
or data frame defining which anisotropy parameters should be adjusted
by update
. If fit.aniso
is a matrix (or
a data frame) then it must have nset
rows and
length(object[["variogram.object"]][[i]][["fit.aniso"]])
columns with anisotropy parameter fitting flags for the nset
cross-validation sets(i is the ith variogram structure),
and colnames(param)
must match
names(object[["variogram.object"]][[i]][["fit.aniso"]])
.
a logical scalar controlling whether fitted values
of param
(and aniso
are used as initial values when
variogram parameters are fitted for the cross-validation sets (default
TRUE
).
a logical scalar controlling whether information about the fit
should be returned when re-estimating the model with the reduced data
sets (default FALSE
).
a logical scalar controlling whether the complete fitted
model objects, fitted to the reduced data sets, are returned
(FALSE
) or only some components (TRUE
, default, see
Value). Ignored if return.fit = FALSE
.
a logical scalar controlling whether Kriging predictions of a
log-transformed response should be transformed back to the original scale
of the measurements (default FALSE
).
a character keyword controlling what is done when some
levels of factor(s) are not present in any of the subsets used to fit the
model. The function either stops ("stop"
) or treats the
respective factors as model offset ("offset"
, default).
a positive integer controlling how many cores are used for parallelized computations, see Details.
a positive integer controlling logging of diagnostic
messages to the console during model fitting. Passed by
update
to georob
.
additional arguments passed by update
to georob
, see Details.
Andreas Papritz papritz@retired.ethz.ch.
Note that the data frame passed as data
argument to georob
must exist in the user workspace
when calling cv.georob
.
cv.georob
uses the packages parallel and snowfall for
parallelized computations. By default, the function uses \(K\) CPUs
but not more than are physically available (as returned by
detectCores
).
cv.georob
uses the function update
to
re-estimated the model with the reduced data sets. Therefore, any
argument accepted by georob
except data
can be
changed when re-fitting the model. Some of them (e.g. formula
,
subset
, etc.) are explicit arguments of cv.georob
, but
also the remaining ones can be passed by ...
to the function.
Practitioners in geostatistics commonly cross-validate a fitted model
without re-estimating the model parameters with the reduced data sets.
This is clearly an unsound practice (see Hastie et al., 2009, sec.
7.10). Therefore, the argument re.estimate
should always be set
to TRUE
. The alternative is provided only for historic reasons.
Hastie, T., Tibshirani, R. and Friedman, J. (2009) The Elements of Statistical Learning; Data Mining, Inference and Prediction, Springer, New York, tools:::Rd_expr_doi("10.1007/978-0-387-84858-7")
georobPackage
for a description of the model and a brief summary of the algorithms;
georob
for (robust) fitting of spatial linear models;
georobObject
for a description of the class georob
;
profilelogLik
for computing profiles of Gaussian likelihoods;
plot.georob
for display of RE(ML) variogram estimates;
control.georob
for controlling the behaviour of georob
;
georobModelBuilding
for stepwise building models of class georob
;
georobMethods
for further methods for the class georob
;
predict.georob
for computing robust Kriging predictions;
validate.predictions
for validating Kriging predictions;
lgnpp
for unbiased back-transformation of Kriging prediction
of log-transformed data;
georobSimulation
for simulating realizations of a Gaussian process
from model fitted by georob
; and finally
sample.variogram
and fit.variogram.model
for robust estimation and modelling of sample variograms.
## define number of cores for parallel computations
if(interactive()) ncpu <- 10L else ncpu <- 1L
data(meuse)
r.logzn <- georob(log(zinc) ~ sqrt(dist), data = meuse, locations = ~ x + y,
variogram.model = "RMexp",
param = c(variance = 0.15, nugget = 0.05, scale = 200),
tuning.psi = 1000)
if(interactive()){
## example is run only in interactive session because cpu times exceeds 5 s
r.logzn.cv.1 <- cv(r.logzn, seed = 1, lgn = TRUE, ncores = 1, verbose = 1)
r.logzn.cv.2 <- cv(r.logzn, formula = .~. + ffreq, seed = 1, lgn = TRUE,
ncores = ncpu)
plot(r.logzn.cv.1, type = "bs")
plot(r.logzn.cv.2, type = "bs", add = TRUE, col = "red")
legend("topright", lty = 1, col = c("black", "red"), bty = "n",
legend = c("log(Zn) ~ sqrt(dist)", "log(Zn) ~ sqrt(dist) + ffreq"))
}
Run the code above in your browser using DataLab