pick.from.points: Pick Variable from Spatial Dataset

Description

These functions pick (i.e. interpolate without worrying too much about theory) values of a spatial variables from a data stored in a data.frame, a point shapefile, or an ASCII or SAGA grid, using nearest neighbor or kriging interpolation. pick.from.points and [internal.]pick.from.ascii.grid are the core functions that are called by the different wrappers.

Usage

pick.from.points(data, src, pick, method = c("nearest.neighbour", "krige"), set.na = FALSE, radius = 200, nmin = 0, nmax = 100, sill = 1, range = radius, nugget = 0, model = vgm(sill - nugget, "Sph", range = range, nugget = nugget), log = rep(FALSE, length(pick)), X.name = "x", Y.name = "y", cbind = TRUE)
pick.from.shapefile(data, shapefile, X.name = "x", Y.name = "y", ...)
pick.from.ascii.grid(data, file, path = NULL, varname = NULL, prefix = NULL, method = c("nearest.neighbour", "krige"), cbind = TRUE, parallel = FALSE, nsplit, quiet = TRUE, ...)
pick.from.ascii.grids(data, file, path = NULL, varname = NULL, prefix = NULL, cbind = TRUE, quiet = TRUE, ...)
internal.pick.from.ascii.grid(data, file, path = NULL, varname = NULL, prefix = NULL, method = c("nearest.neighbour", "krige"), nodata.values = c(-9999, -99999), at.once, quiet = TRUE, X.name = "x", Y.name = "y", nlines = Inf, cbind = TRUE, range, radius, na.strings = "NA", ...)
pick.from.saga.grid(data, filename, path, varname, prec = 7, show.output.on.console = FALSE, env = rsaga.env(), ...)

Arguments

data

data.frame giving the coordinates (in columns specified by X.name, Y.name) of point locations at which to interpolate the specified variables or grid values

src

data.frame

pick

variables to be picked (interpolated) from src; if missing, use all available variables, except those specified by X.name and Y.name

method

interpolation method to be used; uses a partial match to the alternatives "nearest.neighbor" (currently the default) and "krige"

set.na

logical: if a column with a name specified in pick already exists in data, how should it be dealt with? set.na=FALSE (default) only overwrites existing data if the interpolator yields a non-NA result; set.na=TRUE passes NA values returned by the interpolator on to the results data.frame

radius

numeric value specifying the radius of the local neighborhood to be used for interpolation; defaults to 200 map units (presumably meters), or, in the functions for grid files, 2.5*cellsize.

nmin

numeric, for method="krige" only: see krige function in package gstat

nmax

numeric, for method="krige" only: see krige function in package gstat

sill

numeric, for method="krige" only: the overall sill parameter to be used for the variogram

range

numeric, for method="krige" only: the variogram range

nugget

numeric, for method="krige" only: the nugget effect

model

for method="krige" only: the variogram model to be used for interpolation; defaults to a spherical variogram with parameters specified by the range, sill, and nugget arguments; see vgm in package gstat for details

log

logical vector, specifying for each variable in pick if interpolation should take place on the logarithmic scale (default: FALSE)

X.name

name of the variable containing the x coordinates

Y.name

name of the variable containing the y coordinates

cbind

logical: shoud the new variables be added to the input data.frame (cbind=TRUE, the default), or should they be returned as a separate vector or data.frame? cbind=FALSE

shapefile

point shapefile

...

arguments to be passed to pick.from.points, and to internal.pick.from.ascii.grid in the case of pick.from.ascii.grid

file

file name (relative to path, default file extension .asc) of an ASCII grid from which to pick a variable, or an open connection to such a file

path

optional path to file

varname

character string: a variable name for the variable interpolated from grid file file in pick.from.*.grid; if missing, variable name will be determined from filename by a call to create.variable.name

prefix

an optional prefix to be added to the varname

parallel

logical (default: FALSE): enable parallel processing; requires additional packages such as doSNOW or doMC. See example below and ddply

nsplit

split the data.frame data in nsplit disjoint subsets in order to increase efficiency by using ddply in package plyr. The default seems to perform well in many situations.

quiet

logical: provide information on the progress of grid processing on screen? (only relevant if at.once=FALSE and method="nearest.neighbour")

nodata.values

numeric vector specifying grid values that should be converted to NA; in addition to the values specified here, the nodata value given in the input grid's header will be used

at.once

logical: should the grid be read as a whole or line by line? at.once=FALSE is useful for processing large grids that do not fit into memory; the argument is currently by default FALSE for method="nearest.neighbour", and it currently MUST be TRUE for all other methods (in these cases, TRUE is the default value); piecewise processing with at.once=FALSE is always faster than processing the whole grid at.once

nlines

numeric: stop after processing nlines lines of the input grid; useful for testing purposes

na.strings

passed on to scan

filename

character: name of a SAGA grid file, default extension .sgrd

prec

numeric, specifying the number of digits to be used in converting a SAGA grid to an ASCII grid in pick.from.saga.grid

show.output.on.console

a logical (default: FALSE), indicates whether to capture the output of the command and show it on the R console (see system, rsaga.geoprocessor).

env

list: RSAGA geoprocessing environment created by rsaga.env

Value

If cbind=TRUE, columns with the new, interpolated variables are added to the input data.frame data.If cbind=FALSE, a data.frame only containing the new variables is returned (possibly coerced to a vector if only one variable is processed).

Details

pick.from.points interpolates the variables defined by pick in the src data.frame to the locations provided by the data data.frame. Only nearest neighbour and ordinary kriging interpolation are currently available. This function is intended for 'data-rich' situations in which not much thought needs to be put into a geostatistical analysis of the spatial structure of a variable. In particular, this function is supposed to provide a simple, 'quick-and-dirty' interface for situations where the src data points are very densely distributed compared to the data locations.

pick.from.shapefile is a front-end of pick.from.points for point shapefiles.

pick.from.ascii.grid retrieves data values from an ASCII raster file using either nearest neighbour or ordinary kriging interpolation. The latter may not be possible for large raster data sets because the entire grid needs to be read into an R matrix. Split-apply-combine strategies are used to improve efficiency and allow for parallelization.

The optional parallelization of pick.from.ascii.grid computation requires the use of a parallel backend package such as doSNOW or doMC, and the parallel backend needs to be registered before calling this function with parallel=TRUE. The example section provides an example using doSNOW on Windows. I have seen 25-40

pick.from.ascii.grids performs multiple pick.from.ascii.grid calls. File path and prefix arguments may be specific to each file (i.e. each may be a character vector), but all interpolation settings will be the same for each file, limiting the flexibility a bit compared to individual pick.from.ascii.grid calls by the user. pick.from.ascii.grids currently processes the files sequentially (i.e. parallelization is limited to the pick.from.ascii.grid calls within this function).

pick.from.saga.grid is the equivalent to pick.from.ascii.grid for SAGA grid files. It simply converts the SAGA grid file to a (temporary) ASCII raster file and applies pick.from.ascii.grid.

internal.pick.from.ascii.grid is an internal 'workhorse' function that by itself would be very inefficient for large data sets data. This function is called by pick.from.ascii.grid, which uses a split-apply-combine strategy implemented in the plyr package.

References

Brenning, A. (2008): Statistical geocomputing combining R and SAGA: The example of landslide susceptibility analysis with generalized additive models. In: J. Boehner, T. Blaschke, L. Montanarella (eds.), SAGA - Seconds Out (= Hamburger Beitraege zur Physischen Geographie und Landschaftsoekologie, 19), 23-32.

Examples

Run this code

## Not run: 
# # assume that 'dem' is an ASCII grid and d a data.frame with variables x and y
# pick.from.ascii.grid(d, "dem")
# # parallel processing on Windows using the doSNOW package:
# require(doSNOW)
# registerDoSNOW(cl <- makeCluster(2, type = "SOCK")) # DualCore processor
# pick.from.ascii.grid(d, "dem", parallel = TRUE)
# # produces two (ignorable) warning messages when using doSNOW
# # typically 25-40% faster than the above on my DualCore notebook
# stopCluster(cl)
# ## End(Not run)

## Not run: 
# # use the meuse data for some tests:
# require(gstat)
# data(meuse)
# data(meuse.grid)
# meuse.nn = pick.from.points(data=meuse.grid, src=meuse,
#     pick=c("cadmium","copper","elev"), method="nearest.neighbour")
# meuse.kr = pick.from.points(data=meuse.grid, src=meuse,
#     pick=c("cadmium","copper","elev"), method="krige", radius=100)
# # it does make a difference:
# plot(meuse.kr$cadmium,meuse.nn$cadmium)
# plot(meuse.kr$copper,meuse.nn$copper)
# plot(meuse.kr$elev,meuse.nn$elev)
# ## End(Not run)

Run the code above in your browser using DataLab