georob is a package for robust analyses of geostatistical data. Such data, say $y(s_i)$, are recorded at a set of locations, $s_i$, $i=1,2, \ldots, n$, in a domain $G in R^d$, $d in (1,2,3)$, along with covariate information $x_j(s_i)$, $j=1,2, \ldots, p$.
Model We use the following model for the data $y_i=y(s_i)$: $$Y(\mbox{\boldmath$s$\unboldmath}_i) = Z(\mbox{\boldmath$s$\unboldmath}_i) + \varepsilon = \mbox{\boldmath $x$\unboldmath}(\mbox{\boldmath$s$\unboldmath}_i)^\mathrm{T} \mbox{\boldmath$\beta$\unboldmath} + B(\mbox{\boldmath$s$\unboldmath}_i) + \varepsilon_i,$$ where $Z(s_i) = x(s_i)^T \beta + B(s_i)$ is the so-called signal, $x(s_i)^T\beta$ is the external drift, ${B(s)}$ is an unobserved stationary or intrinsic spatial Gaussian random field with zero mean, and $\epsilon_i$ is an i.i.d error from a possibly long-tailed distribution with scale parameter $\tau$ ($\tau^2$ is usually called nugget effect). In vector form the model is written as $$\mbox{\boldmath $Y = X \beta + B + \varepsilon$\unboldmath},$$ where $X$ is the model matrix with the rows $ x^T(s_i)$. The (generalized) covariance matrix of the vector of spatial Gaussian random effects $B$ is denoted by $$\mathrm{E}[ \mbox{\boldmath$B$\unboldmath}\, \mbox{\boldmath$B$\unboldmath}^\mathrm{T}] = \mbox{\boldmath$\Gamma$\unboldmath}_\theta = \sigma_{\mathrm{n}}^2\mbox{\boldmath$I$\unboldmath} + \sigma^2\mbox{\boldmath$V$\unboldmath}_\alpha = \sigma^2_Z \, \mbox{\boldmath$V$\unboldmath}_{\alpha,\xi} = \sigma^2_Z \, (\xi \, \mbox{\boldmath$I$\unboldmath} + (1-\xi)\, \mbox{\boldmath$V$\unboldmath}_\alpha ) ,$$ where $\sigma_n^2$ is the variance of seemingly uncorrelated micro-scale variation in $B(s)$ that cannot be resolved with the chosen sampling design, $\sigma^2$ is the variance of the captured auto-correlated variation in $B(s)$, $\sigma_Z^2=\sigma_n^2+\sigma^2$ is the signal variance, and $\xi=\sigma_n^2/\sigma^_Z^2$. To estimate both $\sigma_n^2$ and $\tau^2$ (and not only their sum), one needs replicated measurements for some of the $s_i$. We define $V_{\alpha}$ to be the matrix with elements $$(\mbox{\boldmath$V$\unboldmath}_{\alpha})_{ij} = \gamma_0 - \gamma(|\mbox{\boldmath$A$\unboldmath}\;( \mbox{\boldmath$s$\unboldmath}_i-\mbox{\boldmath$s$\unboldmath}_j)|),$$ where the constant $\gamma_0$ is chosen large enough so that $V_{\alpha}$ is positive definite, $v()$ is a valid stationary or intrinsic variogram, and $ A = A(\alpha, f_1, f_2; \omega, \phi, \zeta)$ is a matrix that is used to model geometrically anisotropic auto-correlation. In more detail, $A$ maps an arbitrary point on an ellipsoidal surface with constant semivariance in $R^d$, centred on the origin, and having lengths of semi-principal axes, $p_1$, $p_2$, $p_3$, equal to $|p_1|=\alpha$, $|p_2|=f_1 \alpha$ and $|p_3|=f_2 \alpha$, $0 < f_2 <= f_1="" <="1$," respectively,="" onto="" the="" surface="" of="" unit="" ball="" centred="" on="" origin.="" orientation="" ellipsoid="" is="" defined="" by="" three="" angles="" $\omega$,="" $\phi$="" and="" $\zeta$:=""
The transformation matrix is given by $$\mbox{\boldmath$A$\unboldmath}= \left(\begin{array}{ccc} 1/\alpha & 0 & 0\\ 0 & 1/(f_1\,\alpha) & 0\\ 0 & 0 & 1/(f_2\,\alpha) \\ \end{array}\right) ( \mbox{\boldmath$C$\unboldmath}_1, \mbox{\boldmath$C$\unboldmath}_2, \mbox{\boldmath$C$\unboldmath}_3, ) $$ where $$\mbox{\boldmath$C$\unboldmath}_1^\mathrm{T} = (\sin\phi \sin\omega, -\cos\zeta \cos\omega + \sin\zeta \cos\phi \sin\omega, -\cos\omega \sin\zeta - \cos\zeta \cos\phi \sin\omega) $$ $$\mbox{\boldmath$C$\unboldmath}_2^\mathrm{T} = (-\sin\phi \cos\omega, \cos\omega \sin\zeta \cos\phi + \cos\zeta \sin\omega, -\cos\zeta \cos\omega \cos\phi + \sin\zeta \sin\omega) $$ $$\mbox{\boldmath$C$\unboldmath}_3^\mathrm{T} = (\cos\phi, -\sin\phi \sin\zeta, \cos\zeta \sin\phi) $$ To model geometrically anisotropic variograms in $R^2$ one has to set $\phi=90$ and $f_2 = 1$, and for $f_1 = f_2 = 1$ one obtains the model for isotropic auto-correlation with range parameter $\alpha$. Note that for isotropic auto-correlation the software processes data for which $d$ may exceed 3. Two remarks are in order:
Estimation The unobserved spatial random effects $B$ at the data locations $s_i$ and the model parameters $\beta$ and $\theta^T = (\sigma^2, \sigma^2_n, \tau^2, \alpha, f_1, f_2, \omega, \phi, \zeta, \ldots) $ are unknown and are estimated in georob either by Gaussian or robust restricted maximum likelihood (REML) or Gaussian maximum likelihood (ML). Here ... denote further parameters of the variogram such as the smoothness parameter of the Whittle-Matérn model. In brief, the robust REML method is based on the insight that for given $\theta$ the kriging predictions (= BLUP) of $B$ and the generalized least squares (GLS = ML) estimates of $\beta$ can be obtained simultaneously by maximizing $$ - \sum_i \left( \frac{ y_i - \mbox{\boldmath $x$\unboldmath}(\mbox{\boldmath$s$\unboldmath}_i)^\mathrm{T} \mbox{\boldmath$\beta$\unboldmath} - B(\mbox{\boldmath$s$\unboldmath}_i) }{\tau} \right)^2 - \mbox{\boldmath$B$\unboldmath}^{\mathrm{T}} \mbox{\boldmath$\Gamma$\unboldmath}^{-1}_\theta \mbox{\boldmath$B$\unboldmath} $$ with respect to $B$ and $\beta$ e.g. Harville (1977). Hence, the BLUP of $B$ and ML estimates of $\beta$ and $\theta$ are obtained by maximizing $$ - \log(\det( \tau^2 \mbox{\boldmath$I$\unboldmath} + \mbox{\boldmath$\Gamma$\unboldmath}_\theta )) - \sum_i \left( \frac{ y_i - \mbox{\boldmath $x$\unboldmath}(\mbox{\boldmath$s$\unboldmath}_i)^\mathrm{T} \mbox{\boldmath$\beta$\unboldmath} - B(\mbox{\boldmath$s$\unboldmath}_i) }{\tau} \right)^2 - \mbox{\boldmath$B$\unboldmath}^{\mathrm{T}} \mbox{\boldmath$\Gamma$\unboldmath}^{-1}_\theta \mbox{\boldmath$B$\unboldmath} $$ jointly with respect to $B$, $\beta$, $\theta$. The respective estimating equations can then by robustified by
see Künsch et al. (2011) for details.
The robustified estimating equations
are solved numerically by a combination of iterated re-weighted least squares
(IRWLS) to estimate $B$ and
$\beta$ for given
$\theta$
and nonlinear root finding by the function
nleqslv of the R package nleqslv
to get $\theta$.
The robustness of the procedure is controlled by the tuning parameter $c$
of the $\psi_c$-function. For $c>=1000$ the algorithm computes
Gaussian (RE)ML estimates and customary plug-in kriging predictions.
Instead of solving the Gaussian (RE)ML estimating equations, our software then
maximizes the Gaussian (restricted) loglikelihood using nlminb or
optim.
georob uses variogram models implemented in the R package
RandomFields (see RMmodel).
Currently, estimation of the parameters of the following models is
implemented:
"RMaskey", "RMbessel", "RMcauchy",
"RMcircular", "RMcubic", "RMdagum",
"RMdampedcos", "RMdewijsian", "RMexp" (default),
"RMfbm", "RMgauss", "RMgencauchy",
"RMgenfbm", "RMgengneiting", "RMgneiting",
"RMlgd", "RMmatern", "RMpenta", "RMqexp",
"RMspheric", "RMstable", "RMwave",
"RMwhittle".
For most variogram parameters, closed-form
expressions of $d\gamma/d\theta_i$ are used in the computations. However,
for the parameter $\nu$ of the models "RMbessel",
"RMmatern" and "RMwhittle" $d\gamma/d\nu$ is evaluated numerically by the function
numericDeriv, and this results in an increase in
computing time when $\nu$ is estimated.
Prediction
Robust plug-in external-drift point kriging predictions
can be computed for an unsampled location
$s_0$
from the covariates
$
x(s_0)$,
the estimated parameters
$hat\beta$,
$hat\theta$
and the predicted random effects
$hatB$
by
$$
\widehat{Y}(\mbox{\boldmath$s$\unboldmath}_0) = \widehat{Z}(\mbox{\boldmath$s$\unboldmath}_0) =
\mbox{\boldmath$x$\unboldmath}(\mbox{\boldmath$s$\unboldmath}_0)^\mathrm{T}
\widehat{\mbox{\boldmath$\beta$\unboldmath}} +
\mbox{\boldmath$\gamma$\unboldmath}^\mathrm{T}_{\widehat{\theta}}(\mbox{\boldmath$s$\unboldmath}_0)
\mbox{\boldmath$\Gamma$\unboldmath}^{-1}_{\widehat{\theta}}
\widehat{\mbox{\boldmath$B$\unboldmath}},
$$
where
$\Gamma_hat\theta$
is the estimated (generalized) covariance matrix of
$B$ and
$
\gamma^T_hat\theta(s_0)$
is the vector with the estimated (generalized) covariances between
$B$ and
$B(s_0)$.
Kriging variances can be computed as well, based on approximated covariances of
$hatB$ and
$hat\beta$
(see Künsch et al., 2011, and appendices of
Nussbaum et al., 2012, for details).
The package georob provides in addition software for computing
robust external-drift block kriging predictions. The required
integrals of the generalized covariance function are computed by
functions of the R package constrainedKriging.
Functionality For the time being, the functionality of georob is limited to robust geostatistical analyses of single response variables. No software is currently available for robust multivariate geostatistical analyses. georob offers functions for:
georob --- which also
fits such models efficiently by Gaussian (RE)ML --- profilelogLik and
control.georob).
residuals.georob,
rstandard.georob, rstudent.georob,
ranef.georob).
sample.variogram and fit.variogram.model).
waldtest.georob, step.georob,
add1.georob, drop1.georob,
extractAIC.georob, logLik.georob,
deviance.georob).
For a robust fit, the loglikelihood is not defined. The function then
computes the (restricted) loglikelihood of an equivalent Gaussian model with
heteroscedastic nugget (see deviance.georob for details).
cv.georob and
validate.predictions).
predict.georob, control.predict.georob).
lgnpp).
Künsch, H. R., Papritz, A., Schwierz, C. and Stahel, W. A. (in preparation) Robust Geostatistics.
Künsch, H. R., Papritz, A., Schwierz, C. and Stahel, W. A. (2011) Robust estimation of the external drift and the variogram of spatial data. Proceedings of the ISI 58th World Statistics Congress of the International Statistical Institute. http://e-collection.library.ethz.ch/eserv/eth:7080/eth-7080-01.pdf Maronna, R. A., Martin, R. D. and Yohai, V. J. (2006) Robust Statistics Theory and Methods, John Wiley \& Sons.
georob for (robust) fitting of spatial linear models;
georobObject for a description of the class georob;
plot.georob for display of RE(ML) variogram estimates;
control.georob for controlling the behaviour of georob;
cv.georob for assessing the goodness of a fit by georob;
predict.georob for computing robust kriging predictions; and finally
georobModelBuilding for stepwise building models of class georob;
georobMethods for further methods for the class georob,
sample.variogram and fit.variogram.model
for robust estimation and modelling of sample variograms.