georob (version 0.3-19)

georobSimulation: Simulating Realizations of Gaussian Processes

Description

This page documents the function condsim that simulates (un)conditional realizations of Gaussian processes from the parameters of a spatial linear model estimated by the function georob.

Usage

condsim(object, newdata, nsim, seed, type =  c("response", "signal"),
    locations, trend.coef = NULL,
    variogram.model = NULL, param = NULL, aniso = NULL, variogram.object = NULL,
    control = control.condsim(), verbose = 0)

control.condsim(use.grid = FALSE, grid.refinement = 2., condsim = TRUE, ce.method = c( "standard", "approximate" ), ce.grid.expansion = 1., include.data.sites = FALSE, means = FALSE, trend.covariates = FALSE, covariances = FALSE, ncores = 1, mmax = 10000, pcmp = control.pcmp())

Value

The output generated by condsim is an object of a ``similar'' class as newdata (data frame,

SpatialPointsDataFrame,

SpatialPixelsDataFrame,

SpatialGridDataFrame,

SpatialPolygonsDataFrame).

The data frame or the

data slot of the Spatial...DataFrame objects have the following components:

  • the coordinates of the prediction points (only present if newdata is a data frame).

  • expct: optionally the (un)conditional means.

  • optionally the covariates required for the trend model.

  • sim.1, sim.2, ...: the (un)conditionally simulated realizations.

The function control.condsim returns a list with parameters to steer condsim, see arguments above.

Arguments

object

an object of class georob (mandatory argument), see georobObject.

newdata

a mandatory data frame, SpatialPointsDataFrame, SpatialPixelsDataFrame, SpatialGridDataFrame, SpatialPoints, SpatialPixels or SpatialGrid object, with the coordinates of points for which simulations are computed and in which to look for variables required for computing fitted values or Kriging predictions, see predict.georob.

nsim

a positive interger with the number of (condititional) realizations to compute (mandatory argument).

seed

an integer seed to initialize random number generation, see set.seed (mandatory argument).

type

a character keyword defining what target quantity should be simulated. Possible values are

  • "signal": the “signal” \(Z(\boldsymbol{s}) = \boldsymbol{x}(\boldsymbol{s})^\mathrm{T} \boldsymbol{\beta} + B(\boldsymbol{s})\) of the process,

  • "response": the observations \(Y(\boldsymbol{s}) = Z(\boldsymbol{s}) + \varepsilon(\boldsymbol{s}),\) (default),

see georobPackage for details on the model specification.

locations

an optional one-sided formula specifying what variables of newdata are the coordinates of the points for which simulations are computed (default
object[["locations.objects"]][["locations"]]).

trend.coef

an optional numeric vector with the coefficients of the trend model to be used for computing the (conditional) mean function of the random process see Details.

variogram.model

an optional character keyword defining the variogram model to be used for the simulations, see georob and Details.

param

an optional named numeric vector with values of the variogram parameters used for the simulations, see georob and Details.

aniso

an optional named numeric vector with values of anisotropy parameters of a variogram used for the simulations, see georob and Details.

variogram.object

an optional list that defines a possibly nested variogram model used for the simulations, see georob and Details.

control

a list with the components use.grid, grid.refinement, condsim, ce.method, ce.grid.expansion, include.data.sites, means, trend.covariates,
covariances, ncores, mmax and pcmp or a function such as control.condsim that generates such a list, see arguments of control.condsim for details.

verbose

a positive integer controlling logging of diagnostic messages to the console.
verbose = 0 (default) suppresses such messages.

use.grid

a logical scalar (default FALSE) to control whether (conditional) realizations are computed for a rectangular grid instead of the coordinates of points contained in newdata, see Details.

grid.refinement

a numeric that defines a factor by which the minimum differences of the coordinates between any pair of points in newdata are divided to setup the simulation grid, should be > 1 (default 2), see Details.

condsim

a logical scalar to control whether conditional (TRUE default) or unconditional simulations (FALSE) are computed.

ce.method

a character keyword to select the method to simulate realizations by the circulant embedding algorithm, see Details.

ce.grid.expansion

a numeric with the factor by which the dimensions of the simulation grid is expanded in the circulant embedding algorithm. Should be \(\ge 1\) (default 1).

include.data.sites

a logical scalar, to control whether (conditionally) simulated values are computed also for the points of the original data set used to estimate the model parameters and contained in object.

means

a logical scalar, to control whether the (un)conditional means are included in the output.

trend.covariates

a logical scalar, to control whether the covariates required for the trend model are included in the output.

covariances

a logical scalar, to control whether the covariances between the points of the original data set used to estimate the model parameters (attr gcvmat.d.d,
compressed matrix) and the covariances between the simulation and the original data points (attr gcvmat.s.d, matrix) are returned as attributes of the output. Note that these covariances are only returned if condsim = TRUE.

ncores

a positive integer controlling how many cores are used for parallelized computations, defaults to 1.

mmax

a positive integer equal to the maximum number (default 10000) of prediction items, computed in sub-tasks executed in parallel, see section Details of predict.georob.

pcmp

a list of arguments, passed e.g. to pmm or a function such as control.pcmp that generates such a list (see control.pcmp for allowed arguments).

Author

Andreas Papritz papritz@retired.ethz.ch.

Details

General

condsim (conditionally) simulates from a Gaussian process that has a linear mean function with parameters \(\boldsymbol{\beta}\) and an auto-correlation structure characterized by a parametric variogram model and variogram parameters \(\tau^2\) and \(\boldsymbol{\theta}\) (see georobPackage for the employed parametrization of the spatial linear model). The parameters of the mean and auto-correlation function are either taken from the spatial linear model estimated by georob and passed by the argument object to condsim or from the optional arguments trend.coef (\(\boldsymbol{\beta}\)) and variogram.model, param, aniso or
variogram.object (\(\tau^2\), \(\boldsymbol{\theta}\)).

Simulated values are computed for the points in newdata and optionally also for the data points in object if include.data.sites = TRUE. Both unconditional and conditional simulations can be computed. In the latter cases, the simulated values are always conditioned to the response data used to fit the spatial linear model by georob and contained in object.

Unconditional simulation

Unconditional realizations are either computed for the exact locations of the points in newdata (use.grid = FALSE), irrespective of the fact whether these are arranged on a regular grid or not. Simulations are then generated by the Cholesky matrix decomposition method (e.g. Chilès and Delfiner, 1999, sec. 7.2.2).

For use.grid = TRUE the points in newdata are matched to a rectangular simulation grid and the simulations are generated for all nodes of this grid by the circulant embedding method (Davis and Bryant, 2013; Dietrich and Newsam, 1993; Wood and Chan, 1994). For large problems this approach may be substantially faster and less memory demanding than the Cholesky matrix decomposition method.

For circulant embedding, first a rectangular simulation grid is constructed from the coordinates of the points in newdata and object. The spacings of the simulation grid is equal to the minimum coordinate differences between any pair of points in newdata, divided by grid.refinement. The spatial extent of the simulation grid is chosen such that it covers the bounding boxes of all points in newdata and object. The points in newdata and object are then matched to the closest nodes of the simulation grid. If the same node is assigned to a point in object and newdata then the point in object is kept and the concerned point in newdata is omitted.

The rectangular simulation grid is then expanded to the larger circulant embedding grid, and the eigenvalues of the so-called base matrix (= first row of the covariance matrix of the nodes of the circulant embedding grid with block circulant structure, see Davies and Bryant, 2013) are computed by fast discrete Fourier transform (FFT). It may happen that some of the eigenvalues of the base matrix are negative. The standard circulant embedding algorithm then fails.

Two approaches are implemented in condsim to handle this situation:

  • First, one may use the approximate circulant embedding method by choosing ce.method = "approximate". This sets the negative eigenvalues of the base matrix to zero and scales the eigenvalues, see Chan and Wood (1994, sec. 4, choice \(\rho = \rho_2\)).

  • Second, one may attempt to avoid the problem of negative eigenvalues by increasing the size of the simulation (and circulant embedding) grids. This can be achieved by choosing a value \(> 1\) for the argument ce.grid.expansion, see respective parts in Dietrich and Newsam (1993, sec. 4) and Wood and Chan (1994, sec. 3).

Note that the dimension of the simulation and embedding grids are chosen such that the number of nodes is a highly composite integer. This allows efficient FFT.

Conditional simulation

For both the Cholesky matrix decomposition and the circulant embedding approach, simulations are conditioned to data by the Kriging method, see Chilès and Delfiner, 1999, sec. 7.3.

Parallelized computations

condsim uses the packages parallel and snowfall for parallelized computations. Three tasks can be executed in parallel:

  • Computation of (generalized correlations), see control.pcmp how to do this.

  • Computation of Kriging predictions required for conditional simulations, see section Details of predict.georob.

  • Fast Fourier transform of realizations of standard normal deviates generated for the nodes of the base matrix (see Davies and Bryant, 2013, steps 3--5 of algorithm). If there are nsim realizations to simulate, the task is split into ceiling(nsim / ncores) sub-tasks that are then distributed to ncores CPUs. Evidently, ncores = 1 (default) suppresses parallel execution.

References

Chilès, J.-P., Delfiner, P. (1999) Geostatistics Modeling Spatial Uncertainty, Wiley, New York, tools:::Rd_expr_doi("10.1002/9780470316993").

Davies, T. M., Bryant, D. (2013) On circulant embedding for gaussian random fields in R, Journal of Statistical Software, 55, 1--21, tools:::Rd_expr_doi("10.18637/jss.v055.i09").

Dietrich, C. R., Newsam, G. N. (1993) A fast and exact method for multidimensional gaussian stochastic simulations, Water Resources Research, 9, 2861--2869, tools:::Rd_expr_doi("10.1029/93WR01070").

Wood, A. T. A., Chan, G. (1994) Simulation of stationary gaussian processes in \([0,1]^d\), Journal of Computational and Graphcal Statistics, 3, 409--432, tools:::Rd_expr_doi("10.2307/1390903").

See Also

georobPackage for a description of the model and a brief summary of the algorithms;

georob for (robust) fitting of spatial linear models;

georobObject for a description of the class georob;

profilelogLik for computing profiles of Gaussian likelihoods;

plot.georob for display of RE(ML) variogram estimates;

control.georob for controlling the behaviour of georob;

georobModelBuilding for stepwise building models of class georob;

cv.georob for assessing the goodness of a fit by georob;

georobMethods for further methods for the class georob;

predict.georob for computing robust Kriging predictions;

lgnpp for unbiased back-transformation of Kriging prediction of log-transformed data;

sample.variogram and fit.variogram.model for robust estimation and modelling of sample variograms.

Examples

Run this code
data(meuse)
data(meuse.grid)

## convert to SpatialGridDataFrame
meuse.grid.sgdf <- meuse.grid
coordinates(meuse.grid.sgdf) <- ~ x + y
gridded(meuse.grid.sgdf) <- TRUE
fullgrid(meuse.grid.sgdf) <- TRUE

## Gaussian REML fit
r.logzn.reml <- georob(log(zinc) ~ sqrt(dist), data = meuse,
    locations = ~ x + y, variogram.model = "RMexp",
    param = c(variance = 0.15, nugget = 0.05, scale = 200),
    tuning.psi = 1000)

## Unconditional simulations using circulant embedding on rectangular
## simulation grid
r.sim.1 <- condsim(r.logzn.reml, newdata = meuse.grid.sgdf, nsim = 2, seed = 1,
    control = control.condsim(use.grid = TRUE, condsim = FALSE))
spplot(r.sim.1, zcol = "sim.1", at = seq(3.5, 8.5, by = 0.5))

## Conditional simulations using circulant embedding
if(interactive()){
  ## example is run only in interactive session because cpu times exceeds 5 s
  r.sim.2 <- condsim(r.logzn.reml, newdata = meuse.grid.sgdf, nsim = 2, seed = 1,
      control = control.condsim(use.grid = FALSE, condsim = TRUE))
  spplot(r.sim.2, zcol = "sim.2", at = seq(3.5, 8.5, by = 0.5))
}

Run the code above in your browser using DataLab