This page documents the function condsim
that
simulates (un)conditional realizations of Gaussian processes from the
parameters of a spatial linear model estimated by the function
georob
.
condsim(object, newdata, nsim, seed, type = c("response", "signal"),
locations, trend.coef = NULL,
variogram.model = NULL, param = NULL, aniso = NULL, variogram.object = NULL,
control = control.condsim(), verbose = 0) control.condsim(use.grid = FALSE, grid.refinement = 2.,
condsim = TRUE, ce.method = c( "standard", "approximate" ),
ce.grid.expansion = 1., include.data.sites = FALSE,
means = FALSE, trend.covariates = FALSE, covariances = FALSE,
ncores = 1, mmax = 10000, pcmp = control.pcmp())
The output generated by condsim
is an object of a ``similar''
class as newdata
(data frame,
The data frame or the
data
slot of the Spatial...DataFrame
objects
have the following components:
the coordinates of the prediction points (only present if
newdata
is a data frame).
expct
: optionally the (un)conditional means.
optionally the covariates required for the trend model.
sim.1
, sim.2
, ...: the (un)conditionally
simulated realizations.
The function control.condsim
returns a list with parameters to
steer condsim
, see arguments above.
an object of class georob
(mandatory argument), see
georobObject
.
a mandatory data frame,
SpatialPointsDataFrame
,
SpatialPixelsDataFrame
,
SpatialGridDataFrame
,
SpatialPoints
,
SpatialPixels
or
SpatialGrid
object,
with the coordinates of points for which simulations are computed
and in which to look for variables required for computing fitted values
or Kriging predictions, see predict.georob
.
a positive interger with the number of (condititional) realizations to compute (mandatory argument).
an integer seed to initialize random number generation,
see set.seed
(mandatory argument).
a character keyword defining what target quantity should be simulated. Possible values are
"signal"
: the “signal”
\(Z(\boldsymbol{s}) =
\boldsymbol{x}(\boldsymbol{s})^\mathrm{T}
\boldsymbol{\beta} +
B(\boldsymbol{s})\) of
the process,
"response"
: the observations
\(Y(\boldsymbol{s}) =
Z(\boldsymbol{s}) +
\varepsilon(\boldsymbol{s}),\) (default),
see georobPackage
for details on the model specification.
an optional one-sided formula specifying what variables
of newdata
are the coordinates of the points for which simulations
are computed (default
object[["locations.objects"]][["locations"]]
).
an optional numeric vector with the coefficients of the trend model to be used for computing the (conditional) mean function of the random process see Details.
an optional character keyword defining the
variogram model to be used for the simulations, see georob
and Details.
an optional named numeric vector with values of the
variogram parameters used for the simulations, see georob
and Details.
an optional named numeric vector with values of anisotropy
parameters of a variogram used for the simulations, see
georob
and Details.
an optional list that defines a possibly nested
variogram model used for the simulations, see georob
and
Details.
a list with the components use.grid
,
grid.refinement
, condsim
,
ce.method
, ce.grid.expansion
,
include.data.sites
,
means
, trend.covariates
,
covariances
,
ncores
, mmax
and pcmp
or a function such as
control.condsim
that generates such a list, see
arguments of control.condsim
for details.
a positive integer controlling logging of diagnostic
messages to the console.
verbose = 0
(default) suppresses
such messages.
a logical scalar (default FALSE
) to control whether
(conditional) realizations are computed for a rectangular grid instead of
the coordinates of points contained in newdata
, see Details.
a numeric that defines a factor by which the
minimum differences of the coordinates between any pair of points in
newdata
are divided to setup the simulation grid, should be > 1
(default 2), see Details.
a logical scalar to control whether conditional
(TRUE
default) or unconditional simulations (FALSE
) are
computed.
a character keyword to select the method to simulate realizations by the circulant embedding algorithm, see Details.
a numeric with the factor by which the dimensions of the simulation grid is expanded in the circulant embedding algorithm. Should be \(\ge 1\) (default 1).
a logical scalar, to control whether
(conditionally) simulated values are computed also for the points of the
original data set used to estimate the model parameters and contained in
object
.
a logical scalar, to control whether the (un)conditional means are included in the output.
a logical scalar, to control whether the covariates required for the trend model are included in the output.
a logical scalar, to control whether the covariances
between the points of the original data set used to estimate the model
parameters (attr gcvmat.d.d
,
compress
ed matrix) and
the covariances between the simulation and the original data points
(attr gcvmat.s.d
, matrix) are returned as attributes of the
output. Note that these covariances are only returned if
condsim = TRUE
.
a positive integer controlling how many cores are used for parallelized computations, defaults to 1.
a positive integer equal to the maximum number (default
10000
) of prediction items, computed in sub-tasks executed in
parallel, see section Details of predict.georob
.
a list of arguments, passed e.g. to pmm
or a
function such as control.pcmp
that generates such a list
(see control.pcmp
for allowed arguments).
Andreas Papritz papritz@retired.ethz.ch.
condsim
(conditionally) simulates from a Gaussian process that
has a linear mean function with parameters
\(\boldsymbol{\beta}\) and an auto-correlation structure
characterized by a parametric variogram model and variogram parameters
\(\tau^2\) and \(\boldsymbol{\theta}\) (see
georobPackage
for the employed parametrization of the
spatial linear model). The parameters of the mean and auto-correlation
function are either taken from the spatial linear model estimated by
georob
and passed by the argument
object
to condsim
or from the optional arguments
trend.coef
(\(\boldsymbol{\beta}\))
and variogram.model
, param
, aniso
or
variogram.object
(\(\tau^2\),
\(\boldsymbol{\theta}\)).
Simulated values are computed for the points in newdata
and
optionally also for the data points in object
if
include.data.sites = TRUE
. Both unconditional and conditional
simulations can be computed. In the latter cases, the simulated values
are always conditioned to the response data used to fit the spatial
linear model by georob
and contained in object
.
Unconditional realizations are either computed for the exact locations of
the points in newdata
(use.grid = FALSE
), irrespective of
the fact whether these are arranged on a regular grid or not.
Simulations are then generated by the Cholesky matrix decomposition
method (e.g. Chilès and Delfiner, 1999, sec.
7.2.2).
For use.grid = TRUE
the points in newdata
are matched to a
rectangular simulation grid and the simulations are generated for all
nodes of this grid by the circulant embedding method (Davis and
Bryant, 2013; Dietrich and Newsam, 1993; Wood and Chan,
1994). For large problems this approach may be substantially faster and
less memory demanding than the Cholesky matrix decomposition method.
For circulant embedding, first a rectangular simulation grid is
constructed from the coordinates of the points in newdata
and
object
. The spacings of the simulation grid is equal to the
minimum coordinate differences between any pair of points in
newdata
, divided by grid.refinement
. The spatial extent of
the simulation grid is chosen such that it covers the bounding boxes of
all points in newdata
and object
. The points in
newdata
and object
are then matched to the closest nodes of
the simulation grid. If the same node is assigned to a point in
object
and newdata
then the point in object
is kept
and the concerned point in newdata
is omitted.
The rectangular simulation grid is then expanded to the larger circulant embedding grid, and the eigenvalues of the so-called base matrix (= first row of the covariance matrix of the nodes of the circulant embedding grid with block circulant structure, see Davies and Bryant, 2013) are computed by fast discrete Fourier transform (FFT). It may happen that some of the eigenvalues of the base matrix are negative. The standard circulant embedding algorithm then fails.
Two approaches are implemented in condsim
to handle this
situation:
First, one may use the approximate circulant embedding
method by choosing ce.method = "approximate"
. This sets the
negative eigenvalues of the base matrix to zero and scales the
eigenvalues, see Chan and Wood (1994, sec. 4, choice \(\rho =
\rho_2\)).
Second, one may attempt to avoid the problem of negative
eigenvalues by increasing the size of the simulation (and circulant
embedding) grids. This can be achieved by choosing a value \(> 1\)
for the argument ce.grid.expansion
, see respective parts in
Dietrich and Newsam (1993, sec. 4) and Wood and Chan
(1994, sec. 3).
Note that the dimension of the simulation and embedding grids are chosen such that the number of nodes is a highly composite integer. This allows efficient FFT.
For both the Cholesky matrix decomposition and the circulant embedding approach, simulations are conditioned to data by the Kriging method, see Chilès and Delfiner, 1999, sec. 7.3.
condsim
uses the packages parallel and snowfall for
parallelized computations. Three tasks can be executed in parallel:
Computation of (generalized correlations), see
control.pcmp
how to do this.
Computation of Kriging predictions required for conditional
simulations, see section Details of
predict.georob
.
Fast Fourier transform of realizations of standard normal
deviates generated for the nodes of the base matrix (see
Davies and Bryant, 2013, steps 3--5 of algorithm). If there are
nsim
realizations to simulate, the task is split into
ceiling(nsim / ncores)
sub-tasks that are then distributed to
ncores
CPUs. Evidently, ncores = 1
(default) suppresses
parallel execution.
Chilès, J.-P., Delfiner, P. (1999) Geostatistics Modeling Spatial Uncertainty, Wiley, New York, tools:::Rd_expr_doi("10.1002/9780470316993").
Davies, T. M., Bryant, D. (2013) On circulant embedding for gaussian random fields in R, Journal of Statistical Software, 55, 1--21, tools:::Rd_expr_doi("10.18637/jss.v055.i09").
Dietrich, C. R., Newsam, G. N. (1993) A fast and exact method for multidimensional gaussian stochastic simulations, Water Resources Research, 9, 2861--2869, tools:::Rd_expr_doi("10.1029/93WR01070").
Wood, A. T. A., Chan, G. (1994) Simulation of stationary gaussian processes in \([0,1]^d\), Journal of Computational and Graphcal Statistics, 3, 409--432, tools:::Rd_expr_doi("10.2307/1390903").
georobPackage
for a description of the model and a brief summary of the algorithms;
georob
for (robust) fitting of spatial linear models;
georobObject
for a description of the class georob
;
profilelogLik
for computing profiles of Gaussian likelihoods;
plot.georob
for display of RE(ML) variogram estimates;
control.georob
for controlling the behaviour of georob
;
georobModelBuilding
for stepwise building models of class georob
;
cv.georob
for assessing the goodness of a fit by georob
;
georobMethods
for further methods for the class georob
;
predict.georob
for computing robust Kriging predictions;
lgnpp
for unbiased back-transformation of Kriging prediction
of log-transformed data;
sample.variogram
and fit.variogram.model
for robust estimation and modelling of sample variograms.
data(meuse)
data(meuse.grid)
## convert to SpatialGridDataFrame
meuse.grid.sgdf <- meuse.grid
coordinates(meuse.grid.sgdf) <- ~ x + y
gridded(meuse.grid.sgdf) <- TRUE
fullgrid(meuse.grid.sgdf) <- TRUE
## Gaussian REML fit
r.logzn.reml <- georob(log(zinc) ~ sqrt(dist), data = meuse,
locations = ~ x + y, variogram.model = "RMexp",
param = c(variance = 0.15, nugget = 0.05, scale = 200),
tuning.psi = 1000)
## Unconditional simulations using circulant embedding on rectangular
## simulation grid
r.sim.1 <- condsim(r.logzn.reml, newdata = meuse.grid.sgdf, nsim = 2, seed = 1,
control = control.condsim(use.grid = TRUE, condsim = FALSE))
spplot(r.sim.1, zcol = "sim.1", at = seq(3.5, 8.5, by = 0.5))
## Conditional simulations using circulant embedding
if(interactive()){
## example is run only in interactive session because cpu times exceeds 5 s
r.sim.2 <- condsim(r.logzn.reml, newdata = meuse.grid.sgdf, nsim = 2, seed = 1,
control = control.condsim(use.grid = FALSE, condsim = TRUE))
spplot(r.sim.2, zcol = "sim.2", at = seq(3.5, 8.5, by = 0.5))
}
Run the code above in your browser using DataLab