The function georob
fits a linear model with spatially correlated
errors to geostatistical data that are possibly contaminated by
independent outliers. The regression coefficients and the parameters of
the variogram model are estimated by robust or Gaussian restricted
maximum likelihood (REML) or by Gaussian maximum likelihood (ML).
georob(formula, data, subset, weights, na.action, model = TRUE,
x = FALSE, y = FALSE, contrasts = NULL, offset, locations,
variogram.model = c("RMexp", "RMaskey", "RMbessel", "RMcauchy",
"RMcircular", "RMcubic", "RMdagum", "RMdampedcos", "RMdewijsian",
"RMfbm", "RMgauss", "RMgencauchy", "RMgenfbm", "RMgengneiting",
"RMgneiting", "RMlgd", "RMmatern", "RMpenta", "RMqexp",
"RMspheric", "RMstable", "RMwave", "RMwhittle"),
param, fit.param = default.fit.param()[names(param)],
aniso = default.aniso(), fit.aniso = default.fit.aniso(),
variogram.object = NULL,
tuning.psi = 2, control = control.georob(),
verbose = 0, ...)
an optional data frame, a
SpatialPointsDataFrame
, list or environment (or another
object coercible by as.data.frame
to a data frame)
containing the variables in the model and the coordinates where the data
was recorded. If not found in data
, the variables are taken from
environment(formula)
, typically the environment from which
georob
is called.
an optional vector specifying a subset of observations to be used in the fitting process.
an optional vector of weights to be used in the fitting process, currently ignored.
a function which indicates what should happen when the
data contain NA
s. The default is set by the na.action
argument of options
, and is na.fail
if that is
unset. The “factory-fresh” default is na.omit
.
Another possible value is NULL
, no action. Value
na.exclude
can be useful.
logicals. If TRUE
the corresponding components
of the fit (the model frame, the model matrix, the response) are
returned. The model frame is augmented by the coordinates.
an optional list. See the contrasts.arg
of
model.matrix.default
.
this optional argument can be used to specify an a
priori known component to be included in the linear predictor during
fitting. An offset
term can be included in the formula
instead or as well, and if both are specified their sum is used.
a one-sided formula defining the variables that are used as coordinates of the locations were the data was recorded (mandatory argument).
a character keyword defining the variogram model
to be fitted. Currently, most basic variogram models
provided by the package RandomFields can be fitted (see Details
and RMmodel
).
a named numeric vector with initial values of the
variogram parameters (mandatory argument). The names of param
are matched against the following names (see Details and
georobIntro
for information about the parametrization of
variogram models):
variance
: variance (sill \(\sigma^2\)) of the
auto-correlated component of the Gaussian random field
\(B(\mbox{\boldmath$s$\unboldmath})\).
snugget
: variance
(spatial nugget \(\sigma^2_{\mathrm{n}}\))
of the seemingly spatially uncorrelated component of
\(B(\mbox{\boldmath$s$\unboldmath})\)
(micro-scale spatial variation; default value snugget = 0
).
nugget
: variance (nugget \(\tau^2\)) of the
independent errors
\(\varepsilon(\mbox{\boldmath$s$\unboldmath})\).
scale
: range parameter (\(\alpha\)) of the variogram.
names of additional variogram parameters such as the smoothness
parameter \(\nu\) of the Whittle-Mat<U+008E>rn model (see
RMmodel
and
param.names
).
a named logical vector (or a function such as
default.fit.param
that creates this vector) with the same
names as used for param
, defining which parameters are adjusted
(TRUE
) and which are kept fixed at their initial values
(FALSE
) when fitting the model.
a named numeric vector with initial values (or a function such as
default.aniso
that creates this vector) for fitting
geometrically anisotropic variogram models. The names of aniso
are matched against the following names (see Details and
georobIntro
for information about the parametrization of
variogram models):
f1
: ratio \(f_1\) of lengths of second and first
semi-principal axes of an ellipsoidal surface with constant
semi-variance in \(\mathrm{I}\!\mathrm{R}^3\) (default f1 = 1
).
f2
: ratio \(f_2\) of lengths of third and first
semi-principal axes of the semi-variance ellipsoid (default f2 = 1
).
omega
: azimuth in degrees of the first semi-principal axis
of the semi-variance ellipsoid (default omega = 90
).
phi
: 90 degrees minus altitude of the first semi-principal axis
of the semi-variance ellipsoid (default phi = 90
).
zeta
: angle in degrees between the second semi-principal
axis and the direction of the line defined by the intersection
between the \(x\)-\(y\)-plane and the plane orthogonal to the
first semi-principal axis of the semi-variance ellipsoid through the
origin (default zeta = 0
).
a named logical vector (or a function such as
default.fit.aniso
that creates this vector) with the same
names as used for aniso
, defining which parameters are adjusted
(TRUE
) and which are kept fixed at their initial values
(FALSE
) when fitting the model.
an optional list that defines a possibly nested variogram model. Each component is itself a list with the following components:
variogram.model: a mandatory character keyword defining the variogram model, see respective argument above.
param: a mandatory named numeric vector with initial values of the variogram parameters, see respective argument above.
fit.param: an optional named logical vector defining which parameters are adjusted, see respective argument above.
aniso: an optional named numeric vector with initial values for fitting geometrically anisotropic variogram models, see respective argument above.
fit.param: an optional named logical vector defining which anisotropy parameters are adjusted, see respective argument above.
Note that the arguments variogram.model
, param
,
fit.param
, aniso
and fit.aniso
are ignored when
variogram.object
is passed to georob
.
positive numeric. The tuning constant \(c\) of the \(\psi_c\)-function of the robust REML algorithm.
a list specifying parameters that control the behaviour of
georob
. Use the function control.georob
and see its
help page for the components of control
.
positive integer controlling logging of diagnostic
messages to the console during model fitting. verbose = 0
largely
suppresses such messages and verbose = 4
asks for most verbose
output (see control
arguments of nleqslv
,
nlminb
and optim
and
control.georob
for information how to fine tuning
diagnostic output generated by nleqslv
, nlminb
and
optim
).
further arguments passed to function (e.g. object.
used internally for updating georob
objects).
An object of class georob
representing a robust (or Gaussian) (RE)ML
fit of a spatial linear model. See
georobObject
for the components of the fit.
georob
fits a spatial linear model by robust or Gaussian RE(ML)
(K<U+009F>nsch et al., 2011, K<U+009F>nsch
et al., in preparation). georobIntro
describes the
employed model and briefly sketches the robust REML estimation and the
robust external drift Kriging method. Here, we describe further details
of georob
.
Currently, most basic variogram models provided by the
package RandomFields can be fitted by
georob
(see argument variogram.model
for a list of
implemented models). Some of these models have in addition to
variance
, snugget
, nugget
and scale
further
parameters. Initial values of these parameters (param
) and
fitting flags (fit.param
) must be passed to georob
by the
same names as used by the functions RM...
of the package
RandomFields (see RMmodel
). Use the
function param.names
to list additional parameters of a
given variogram.model.
The arguments fit.param
and fit.aniso
are used to control
what variogram and anisotropy parameters are estimated and which are
kept at the constant initial values. The functions
default.fit.param
and default.fit.aniso
set
reasonable default values for these arguments. Note, as an aside, that
the function default.aniso
sets (default) values of the
anisotropy parameters for an isotropic variogram.
The intrinsic variogram model RMfbm
is over-parametrized when
both the variance
(plus possibly snugget
) and the
scale
are estimated. Therefore, to estimate the parameters of
this model, scale
must be kept fixed at an arbitrary value by
using fit.param["scale"] = FALSE
.
The subsection Model of georobIntro
describes
how such models are parametrized and gives definitions the various
elements of aniso
. Some additional remarks might be helpful:
The first semi-principal axis points into the direction with
the farthest reaching auto-correlation, which is described by the range
parameter scale
(\(\alpha\)).
The ranges in the direction of the second and third semi-principal axes are given by \(f_1\alpha\) and \(f_2 \alpha\), with \(0 < f_2 \leq f_1 \leq 1\).
The default values for aniso
(\(f_1=1\), \(f_2=1\))
define an isotropic variogram model.
Valid ranges for the angles characterizing the orientation of the semi-variance ellipsoid are (in degrees): \(\omega\) [0, 180], \(\phi\) [0, 180], \(\zeta\) [-90, 90].
Simultaneous estimation of the variance of the micro-scale variation
(snugget
, \(\sigma_\mathrm{n}^2\)), appears seemingly
as spatially uncorrelated with a given sampling design, and of the
variance (nugget
, \(\tau^2\)) of the independent errors
requires that for some locations
\(\mbox{\boldmath$s$\unboldmath}_i\) replicated observations are
available. Locations less or equal than zero.dist
apart are
thereby considered as being coincident (see
control.georob
).
Parameters of variogram models can vary only within certain bounds (see
param.bounds
and RMmodel
for allowed ranges). georob
uses three mechanisms to constrain
parameter estimates to permissible ranges:
Parameter transformations: By default, all variance
(variance
, snugget
, nugget
), the range
scale
, the anisotropy parameters f1
and f2
and
many of the additional parameters are log-transformed before
solving the estimating equations or maximizing the restricted
log-likelihood and this warrants that the estimates are always
positive (see control.georob
for detailed explanations
how to control parameter transformations).
Checking permissible ranges: The additional parameters
of the variogram models such as the smoothness parameter \(\nu\)
of the Whittle-Mat<U+008E>rn model are forced to stay in the permissible
ranges by signalling an error to nleqslv
, nlminb
or
optim
if the current trial values are invalid. These
functions then graciously update the trial values of the parameters
and carry their task on. However, it is clear that such a
procedure likely gets stuck at a point on the boundary of the
parameter space and is therefore just a workaround for avoiding
runtime errors due to invalid parameter values.
Exploiting the functionality of nlminb
and
optim
: If a spatial model is fitted non-robustly, then the
arguments lower
, upper
(and method
of
optim
) can be used to constrain the parameters
(see control.optim
how to pass them to optim
).
For optim
one has to use the arguments method =
"L-BFGS-B"
, lower = l
, upper = u
, where
l and u are numeric vectors with the lower and upper
bounds of the transformed parameters in the order as they
appear in c(variance, snugget, nugget, scale,
…)[fit.param], aniso[fit.aniso])
, where …
are
additional parameters of isotropic variogram models (use
param.names(variogram.model)
to display the names and the
order of the additional parameters for variogram.model
).
To solve the robustified estimating equations for \(\mbox{\boldmath$B$\unboldmath}\) and \(\mbox{\boldmath$\beta$\unboldmath}\) the following initial estimates are used:
\( \widehat{\mbox{\boldmath$B$\unboldmath}}=
\mbox{\boldmath$0$\unboldmath},\) if this turns out to be
infeasible, initial values can be passed to georob
by the
argument bhat
of control.georob
.
\(\widehat{\mbox{\boldmath$\beta$\unboldmath}}\) is
either estimated robustly by the function
lmrob
, rq
or
non-robustly by lm
(see argument
initial.fixef
of control.georob
).
Finding the roots of the robustified estimating equations of the
variogram and anisotropy parameters is more sensitive to a good choice
of initial values than maximizing the Gaussian (restricted)
log-likelihood with respect to the same parameters. If the initial
values for param
and aniso
are not sufficiently close to
the roots of the system of nonlinear equations, then
nleqslv
may fail to find them.
Setting
initial.param = TRUE
allows one to find initial values that are
often sufficiently close to the roots so that
nleqslv
converges. This is achieved by:
Initial values of the regression parameters are computed by
lmrob
irrespective of the choice for
initial.fixef
(see control.georob
).
Observations with “robustness weights” of the
lmrob
fit, satisfying
\(\psi_c(\widehat{\varepsilon}_i/\widehat{\tau})/(\widehat{\varepsilon}_i/\widehat{\tau})
\leq \mbox{\code{min.rweight}}\), are discarded (see
control.georob
).
The model is fit to the pruned data set by Gaussian REML using
optim
.
The resulting estimates of the variogram parameters
(param
, aniso
) are used as initial estimates for the
subsequent robust fit of the model by nleqslv
.
Note that for step 3 above, initial values of param
and
aniso
must be provided to georob
.
Unlike robust REML, where robustified estimating equations are solved
for the variance parameters nugget
(\(\tau^2\)),
variance
(\(\sigma^2\)), and possibly snugget
(\(\sigma_{\mathrm{n}}^2\)), for Gaussian (RE)ML the
variances can be re-parametrized to
the signal variance \(\sigma_B^2 = \sigma^2 + \sigma_{\mathrm{n}}^2\),
the inverse relative nugget \(\eta = \sigma_B^2 / \tau^2\) and
the relative auto-correlated signal variance \(\xi = \sigma^2/\sigma_B^2\).
georob
maximizes then a (restricted) profile
log-likelihood that depends only on \(\eta\), \(\xi\),
\(\alpha\), …, and \(\sigma_B^2\) is estimated by an explicit
expression that depends on these parameters (e.g. Diggle and Ribeiro,
2006, p. 113). This is usually more efficient than maximizing the
(restricted) log-likelihood with respect to the original variance
parameters \(\tau^2\), \(\sigma_{\mathrm{n}}^2\) and
\(\sigma^2\). georob
chooses the parametrization
automatically, but the user can control it by the argument
reparam
of the function control.georob
.
Diggle, P. J. and Ribeiro, P. J. R. (2006) Model-based Geostatistics. Springer.
K<U+009F>nsch, H. R., Papritz, A., Schwierz, C. and Stahel, W. A. (in preparation) Robust Geostatistics.
K<U+009F>nsch, H. R., Papritz, A., Schwierz, C. and Stahel, W. A. (2011) Robust estimation of the external drift and the variogram of spatial data. Proceedings of the ISI 58th World Statistics Congress of the International Statistical Institute. http://e-collection.library.ethz.ch/eserv/eth:7080/eth-7080-01.pdf
georobIntro
for a description of the model and a brief summary of the algorithms;
georobObject
for a description of the class georob
;
plot.georob
for display of RE(ML) variogram estimates;
control.georob
for controlling the behaviour of georob
;
cv.georob
for assessing the goodness of a fit by georob
;
predict.georob
for computing robust Kriging predictions; and finally
georobModelBuilding
for stepwise building models of class georob
;
georobMethods
for further methods for the class georob
.
# NOT RUN {
################
## meuse data ##
################
data(meuse)
## Gaussian REML fit
r.logzn.reml <- georob(log(zinc) ~ sqrt(dist), data = meuse, locations = ~ x + y,
variogram.model = "RMexp",
param = c(variance = 0.15, nugget = 0.05, scale = 200),
tuning.psi = 1000)
summary(r.logzn.reml, correlation = TRUE)
## robust REML fit
r.logzn.rob <- update(r.logzn.reml, tuning.psi = 1)
summary(r.logzn.rob, correlation = TRUE)
plot(r.logzn.reml, lag.dist.def = seq(0, 2000, by = 100))
lines(r.logzn.rob, col = "red")
###################
## wolfcamp data ##
###################
data(wolfcamp, package = "geoR")
d.wolfcamp <- data.frame(x = wolfcamp[[1]][,1], y = wolfcamp[[1]][,2],
pressure = wolfcamp[[2]])
## fitting isotropic IRF(0) model
r.irf0.iso <- georob(pressure ~ 1, data = d.wolfcamp, locations = ~ x + y,
variogram.model = "RMfbm",
param = c(variance = 10, nugget = 1500, scale = 1, alpha = 1.5),
fit.param = default.fit.param(scale = FALSE, alpha = TRUE),
tuning.psi = 1000)
summary(r.irf0.iso)
## fitting anisotropic IRF(0) model
r.irf0.aniso <- georob(pressure ~ 1, data = d.wolfcamp, locations = ~ x + y,
variogram.model = "RMfbm",
param = c(variance = 5.9, nugget = 1450, scale = 1, alpha = 1),
fit.param = default.fit.param(scale = FALSE, alpha = TRUE),
aniso = default.aniso(f1 = 0.51, omega = 148.),
fit.aniso = default.fit.aniso(f1 = TRUE, omega = TRUE),
tuning.psi = 1000)
summary(r.irf0.aniso)
plot(r.irf0.iso, lag.dist.def = seq(0, 200, by = 7.5))
plot(r.irf0.aniso, lag.dist.def = seq(0, 200, by = 7.5),
xy.angle.def = c(0, 22.5, 67.5, 112.5, 157.5, 180.),
add = TRUE, col = 2:5)
pchisq(2*(r.irf0.aniso[["loglik"]] - r.irf0.iso[["loglik"]]), 2, lower = FALSE)
# }
Run the code above in your browser using DataLab