georob: Robust Fitting of Spatial Linear Models

Description

The function georob fits a linear model with spatially correlated errors to geostatistical data that are possibly contaminated by independent outliers. The regression coefficients and the parameters of the variogram model are estimated by robust or Gaussian restricted maximum likelihood (REML) or by Gaussian maximum likelihood (ML).

Usage

georob(formula, data, subset, weights, na.action, model = TRUE, 
    x = FALSE, y = FALSE, contrasts = NULL, offset, locations, 
    variogram.model = c("RMexp", "RMaskey", "RMbessel", "RMcauchy", 
        "RMcircular", "RMcubic", "RMdagum", "RMdampedcos", "RMdewijsian", 
        "RMfbm", "RMgauss", "RMgencauchy", "RMgenfbm", "RMgengneiting", 
        "RMgneiting", "RMlgd", "RMmatern", "RMpenta", "RMqexp", 
        "RMspheric", "RMstable", "RMwave", "RMwhittle"), 
    param, fit.param = default.fit.param()[names(param)],
	  aniso = default.aniso(), fit.aniso = default.fit.aniso(),
    variogram.object = NULL,
    tuning.psi = 2, control = control.georob(),
    verbose = 0, ...)

Arguments

formula

a symbolic description of the regression model for the external drift to be fit (mandatory argument). See lm and formula for more details.

data

an optional data frame, a SpatialPointsDataFrame, list or environment (or another object coercible by as.data.frame to a data frame) containing the variables in the model and the coordinates where the data was recorded. If not found in data, the variables are taken from environment(formula), typically the environment from which georob is called.

subset

an optional vector specifying a subset of observations to be used in the fitting process.

weights

an optional vector of weights to be used in the fitting process, currently ignored.

na.action

a function which indicates what should happen when the data contain NAs. The default is set by the na.action argument of options, and is na.fail if that is unset. The “factory-fresh” default is na.omit. Another possible value is NULL, no action. Value na.exclude can be useful.

model, x, y

logicals. If TRUE the corresponding components of the fit (the model frame, the model matrix, the response) are returned. The model frame is augmented by the coordinates.

contrasts

an optional list. See the contrasts.arg of model.matrix.default.

offset

this optional argument can be used to specify an a priori known component to be included in the linear predictor during fitting. An offset term can be included in the formula instead or as well, and if both are specified their sum is used.

locations

a one-sided formula defining the variables that are used as coordinates of the locations were the data was recorded (mandatory argument).

variogram.model

a character keyword defining the variogram model to be fitted. Currently, most basic variogram models provided by the package RandomFields can be fitted (see Details and RMmodel).

param

a named numeric vector with initial values of the variogram parameters (mandatory argument). The names of param are matched against the following names (see Details and georobIntro for information about the parametrization of variogram models):

variance: variance (sill $\sigma^2$) of the auto-correlated component of the Gaussian random field $B(\mbox{\boldmath$s$\unboldmath})$.
snugget: variance (spatial nugget $\sigma^2_{\mathrm{n}}$) of the seemingly spatially uncorrelated component of $B(\mbox{\boldmath$s$\unboldmath})$ (micro-scale spatial variation; default value snugget = 0).
nugget: variance (nugget $\tau^2$) of the independent errors $\varepsilon(\mbox{\boldmath$s$\unboldmath})$.
scale: range parameter ($\alpha$) of the variogram.
names of additional variogram parameters such as the smoothness parameter $\nu$ of the Whittle-Mat<U+008E>rn model (see RMmodel and param.names).

fit.param

a named logical vector (or a function such as default.fit.param that creates this vector) with the same names as used for param, defining which parameters are adjusted (TRUE) and which are kept fixed at their initial values (FALSE) when fitting the model.

aniso

a named numeric vector with initial values (or a function such as default.aniso that creates this vector) for fitting geometrically anisotropic variogram models. The names of aniso are matched against the following names (see Details and georobIntro for information about the parametrization of variogram models):

f1: ratio $f_1$ of lengths of second and first semi-principal axes of an ellipsoidal surface with constant semi-variance in $\mathrm{I}\!\mathrm{R}^3$ (default f1 = 1).
f2: ratio $f_2$ of lengths of third and first semi-principal axes of the semi-variance ellipsoid (default f2 = 1).
omega: azimuth in degrees of the first semi-principal axis of the semi-variance ellipsoid (default omega = 90).
phi: 90 degrees minus altitude of the first semi-principal axis of the semi-variance ellipsoid (default phi = 90).
zeta: angle in degrees between the second semi-principal axis and the direction of the line defined by the intersection between the $x$-$y$-plane and the plane orthogonal to the first semi-principal axis of the semi-variance ellipsoid through the origin (default zeta = 0).

fit.aniso

a named logical vector (or a function such as default.fit.aniso that creates this vector) with the same names as used for aniso, defining which parameters are adjusted (TRUE) and which are kept fixed at their initial values (FALSE) when fitting the model.

variogram.object

an optional list that defines a possibly nested variogram model. Each component is itself a list with the following components:

variogram.model: a mandatory character keyword defining the variogram model, see respective argument above.
param: a mandatory named numeric vector with initial values of the variogram parameters, see respective argument above.
fit.param: an optional named logical vector defining which parameters are adjusted, see respective argument above.
aniso: an optional named numeric vector with initial values for fitting geometrically anisotropic variogram models, see respective argument above.
fit.param: an optional named logical vector defining which anisotropy parameters are adjusted, see respective argument above.

Note that the arguments variogram.model, param, fit.param, aniso and fit.aniso are ignored when variogram.object is passed to georob.

tuning.psi

positive numeric. The tuning constant $c$ of the $\psi_c$-function of the robust REML algorithm.

control

a list specifying parameters that control the behaviour of georob. Use the function control.georob and see its help page for the components of control.

verbose

positive integer controlling logging of diagnostic messages to the console during model fitting. verbose = 0 largely suppresses such messages and verbose = 4 asks for most verbose output (see control arguments of nleqslv, nlminb and optim and control.georob for information how to fine tuning diagnostic output generated by nleqslv, nlminb and optim).

…

further arguments passed to function (e.g. object. used internally for updating georob objects).

Value

An object of class georob representing a robust (or Gaussian) (RE)ML fit of a spatial linear model. See georobObject for the components of the fit.

Details

georob fits a spatial linear model by robust or Gaussian RE(ML) (K<U+009F>nsch et al., 2011, K<U+009F>nsch et al., in preparation). georobIntro describes the employed model and briefly sketches the robust REML estimation and the robust external drift Kriging method. Here, we describe further details of georob.

Implemented variograms

Currently, most basic variogram models provided by the package RandomFields can be fitted by georob (see argument variogram.model for a list of implemented models). Some of these models have in addition to variance, snugget, nugget and scale further parameters. Initial values of these parameters (param) and fitting flags (fit.param) must be passed to georob by the same names as used by the functions RM... of the package RandomFields (see RMmodel). Use the function param.names to list additional parameters of a given variogram.model.

The arguments fit.param and fit.aniso are used to control what variogram and anisotropy parameters are estimated and which are kept at the constant initial values. The functions default.fit.param and default.fit.aniso set reasonable default values for these arguments. Note, as an aside, that the function default.aniso sets (default) values of the anisotropy parameters for an isotropic variogram.

Estimating parameters of power function variogram

The intrinsic variogram model RMfbm is over-parametrized when both the variance (plus possibly snugget) and the scale are estimated. Therefore, to estimate the parameters of this model, scale must be kept fixed at an arbitrary value by using fit.param["scale"] = FALSE.

Estimating parameters of geometrically anisotropic variograms

The subsection Model of georobIntro describes how such models are parametrized and gives definitions the various elements of aniso. Some additional remarks might be helpful:

The first semi-principal axis points into the direction with the farthest reaching auto-correlation, which is described by the range parameter scale ($\alpha$).
The ranges in the direction of the second and third semi-principal axes are given by $f_1\alpha$ and $f_2 \alpha$, with $0 < f_2 \leq f_1 \leq 1$.
The default values for aniso ($f_1=1$, $f_2=1$) define an isotropic variogram model.
Valid ranges for the angles characterizing the orientation of the semi-variance ellipsoid are (in degrees): $\omega$ [0, 180], $\phi$ [0, 180], $\zeta$ [-90, 90].

Estimating variance of micro-scale variation

Simultaneous estimation of the variance of the micro-scale variation (snugget, $\sigma_\mathrm{n}^2$), appears seemingly as spatially uncorrelated with a given sampling design, and of the variance (nugget, $\tau^2$) of the independent errors requires that for some locations $\mbox{\boldmath$s$\unboldmath}_i$ replicated observations are available. Locations less or equal than zero.dist apart are thereby considered as being coincident (see control.georob).

Constraining estimates of variogram parameters

Parameters of variogram models can vary only within certain bounds (see param.bounds and RMmodel for allowed ranges). georob uses three mechanisms to constrain parameter estimates to permissible ranges:

Parameter transformations: By default, all variance (variance, snugget, nugget), the range scale, the anisotropy parameters f1 and f2 and many of the additional parameters are log-transformed before solving the estimating equations or maximizing the restricted log-likelihood and this warrants that the estimates are always positive (see control.georob for detailed explanations how to control parameter transformations).
Checking permissible ranges: The additional parameters of the variogram models such as the smoothness parameter $\nu$ of the Whittle-Mat<U+008E>rn model are forced to stay in the permissible ranges by signalling an error to nleqslv, nlminb or optim if the current trial values are invalid. These functions then graciously update the trial values of the parameters and carry their task on. However, it is clear that such a procedure likely gets stuck at a point on the boundary of the parameter space and is therefore just a workaround for avoiding runtime errors due to invalid parameter values.
Exploiting the functionality of nlminb and optim: If a spatial model is fitted non-robustly, then the arguments lower, upper (and method of optim) can be used to constrain the parameters (see control.optim how to pass them to optim). For optim one has to use the arguments method = "L-BFGS-B", lower = l, upper = u, where l and u are numeric vectors with the lower and upper bounds of the transformed parameters in the order as they appear in c(variance, snugget, nugget, scale, …)[fit.param], aniso[fit.aniso]), where … are additional parameters of isotropic variogram models (use param.names(variogram.model) to display the names and the order of the additional parameters for variogram.model).

Computing robust initial estimates of parameters for robust REML

To solve the robustified estimating equations for $\mbox{\boldmath$B$\unboldmath}$ and $\mbox{\boldmath$\beta$\unboldmath}$ the following initial estimates are used:

$ \widehat{\mbox{\boldmath$B$\unboldmath}}= \mbox{\boldmath$0$\unboldmath},$ if this turns out to be infeasible, initial values can be passed to georob by the argument bhat of control.georob.
$\widehat{\mbox{\boldmath$\beta$\unboldmath}}$ is either estimated robustly by the function lmrob, rq or non-robustly by lm (see argument initial.fixef of control.georob).

Finding the roots of the robustified estimating equations of the variogram and anisotropy parameters is more sensitive to a good choice of initial values than maximizing the Gaussian (restricted) log-likelihood with respect to the same parameters. If the initial values for param and aniso are not sufficiently close to the roots of the system of nonlinear equations, then nleqslv may fail to find them. Setting initial.param = TRUE allows one to find initial values that are often sufficiently close to the roots so that nleqslv converges. This is achieved by:

Initial values of the regression parameters are computed by lmrob irrespective of the choice for initial.fixef (see control.georob).
Observations with “robustness weights” of the lmrob fit, satisfying $\psi_c(\widehat{\varepsilon}_i/\widehat{\tau})/(\widehat{\varepsilon}_i/\widehat{\tau}) \leq \mbox{\code{min.rweight}}$, are discarded (see control.georob).
The model is fit to the pruned data set by Gaussian REML using optim.
The resulting estimates of the variogram parameters (param, aniso) are used as initial estimates for the subsequent robust fit of the model by nleqslv.

Note that for step 3 above, initial values of param and aniso must be provided to georob.

Estimating variance parameters by Gaussian (RE)ML

Unlike robust REML, where robustified estimating equations are solved for the variance parameters nugget ($\tau^2$), variance ($\sigma^2$), and possibly snugget ($\sigma_{\mathrm{n}}^2$), for Gaussian (RE)ML the variances can be re-parametrized to

the signal variance $\sigma_B^2 = \sigma^2 + \sigma_{\mathrm{n}}^2$,
the inverse relative nugget $\eta = \sigma_B^2 / \tau^2$ and
the relative auto-correlated signal variance $\xi = \sigma^2/\sigma_B^2$.

georob maximizes then a (restricted) profile log-likelihood that depends only on $\eta$, $\xi$, $\alpha$, …, and $\sigma_B^2$ is estimated by an explicit expression that depends on these parameters (e.g. Diggle and Ribeiro, 2006, p. 113). This is usually more efficient than maximizing the (restricted) log-likelihood with respect to the original variance parameters $\tau^2$, $\sigma_{\mathrm{n}}^2$ and $\sigma^2$. georob chooses the parametrization automatically, but the user can control it by the argument reparam of the function control.georob.

References

Diggle, P. J. and Ribeiro, P. J. R. (2006) Model-based Geostatistics. Springer.

K<U+009F>nsch, H. R., Papritz, A., Schwierz, C. and Stahel, W. A. (in preparation) Robust Geostatistics.

K<U+009F>nsch, H. R., Papritz, A., Schwierz, C. and Stahel, W. A. (2011) Robust estimation of the external drift and the variogram of spatial data. Proceedings of the ISI 58th World Statistics Congress of the International Statistical Institute. http://e-collection.library.ethz.ch/eserv/eth:7080/eth-7080-01.pdf

Examples

Run this code

# NOT RUN {
################
## meuse data ##
################
data(meuse)

## Gaussian REML fit
r.logzn.reml <- georob(log(zinc) ~ sqrt(dist), data = meuse, locations = ~ x + y,
    variogram.model = "RMexp",
    param = c(variance = 0.15, nugget = 0.05, scale = 200),
    tuning.psi = 1000)
summary(r.logzn.reml, correlation = TRUE)

## robust REML fit 
r.logzn.rob <- update(r.logzn.reml, tuning.psi = 1)
    
summary(r.logzn.rob, correlation = TRUE)

plot(r.logzn.reml, lag.dist.def = seq(0, 2000, by = 100))
lines(r.logzn.rob, col = "red")


###################
## wolfcamp data ##
###################
data(wolfcamp, package = "geoR")
d.wolfcamp <- data.frame(x = wolfcamp[[1]][,1], y = wolfcamp[[1]][,2],
    pressure = wolfcamp[[2]])

## fitting isotropic IRF(0) model
  
r.irf0.iso <- georob(pressure ~ 1, data = d.wolfcamp, locations = ~ x + y, 
    variogram.model = "RMfbm",
    param = c(variance = 10, nugget = 1500, scale = 1, alpha = 1.5),
    fit.param = default.fit.param(scale = FALSE, alpha = TRUE),
    tuning.psi = 1000)
  
summary(r.irf0.iso)

## fitting anisotropic IRF(0) model
  
r.irf0.aniso <- georob(pressure ~ 1, data = d.wolfcamp, locations = ~ x + y, 
    variogram.model = "RMfbm",
    param = c(variance = 5.9, nugget = 1450, scale = 1, alpha = 1),
    fit.param = default.fit.param(scale = FALSE, alpha = TRUE),
    aniso = default.aniso(f1 = 0.51, omega = 148.),
    fit.aniso = default.fit.aniso(f1 = TRUE, omega = TRUE), 
    tuning.psi = 1000)
summary(r.irf0.aniso)

plot(r.irf0.iso, lag.dist.def = seq(0, 200, by = 7.5))
plot(r.irf0.aniso, lag.dist.def = seq(0, 200, by = 7.5), 
    xy.angle.def = c(0, 22.5, 67.5, 112.5, 157.5, 180.), 
    add = TRUE, col = 2:5)
    
pchisq(2*(r.irf0.aniso[["loglik"]] - r.irf0.iso[["loglik"]]), 2, lower = FALSE)
# }

Run the code above in your browser using DataLab