Krig(
x, Y, cov.function = "stationary.cov", lambda = NA, df
= NA, GCV=FALSE, Z = NULL, cost = 1, knots = NA, weights = NULL,
m = 2, nstep.cv = 200, scale.type = "user", x.center =
rep(0, ncol(x)), x.scale = rep(1, ncol(x)), rho = NA,
sigma2 = NA, method = "GCV", verbose = FALSE, mean.obj
= NA, sd.obj = NA, null.function =
"Krig.null.function", wght.function = NULL, offset =
0, outputcall = NULL, na.rm = TRUE, cov.args = NULL,
chol.args = NULL, null.args = NULL, wght.args = NULL,
W = NULL, give.warnings = TRUE, ...)## S3 method for class 'Krig':
fitted(object,...)
## S3 method for class 'Krig':
coef(object,...)
resid.Krig(object,...)
The coef.Krig function only returns the coefficients, "d", associated with the
fixed part of the model (also known as the null space or spatial drift).)) = rho*cov.function(x,x
)
If the covariance is actually a
correlation function then rho is also the "sill".
The Kriging model is: Y.k= f(x.k) = P(x.k) + Z(x.k) + e.k
where ".k" means subscripted by k, Y is the dependent variable observed
at location x.k, P is a low order polynomial, Z is a mean zero, Gaussian
field with covariance function K and e is assumed to be independent
normal errors. The estimated surface is the best linear unbiased
estimate (BLUE) of f(x)= P(x) + Z(x) given the observed data. For this
estimate K, is taken to be rho*cov.function and the errors have variance
sigma**2. In more conventional geostatistical terms rho is the "sill" if
the covariance function is actually a correlation function and sigma**2
is the nugget variance or measure error variance (the two are confounded
in this model.) If the weights are given then the variance of e.k is
sigma**2/ weights.k . In the case that the weights are specified as a
matrix, W, using the wght.function option then the assumed covariance
matrix for the errors is sigma**2 Wi, where Wi is the inverse of W. It
is straightforward to show that the estimate of f only depends on sigma
and rho through the ratio lambda = sigma**2/ rho. This parameter, termed
the smoothing parameter plays a central role in the statistical
computations within Krig
. See also the help for thin plate
splines, (Tps
) to get another perspective on the smoothing
parameter.
This function also supports a modest extension of the Generalized
Kriging model to include other covariates as fixed regression type
components. In matrix form Y = Zb + F + E where Z is a matrix of
covariates and b a fixed parameter vector, F the vector of function values at
the observations and E a vector of errors. The The Z
argument in
the function is the way to specify this additional component.
If the parameters rho and sigma2 are omitted in the call, then they are estimated in the following way. If lambda is given, then sigma2 is estimated from the residual sum of squares divided by the degrees of freedom associated with the residuals. Rho is found as the difference between the sums of squares of the predicted values having subtracted off the polynomial part and sigma2. These estimates are the MLE's under Gaussian assumptions on the process and errors. If lambda is also omitted is it estimated from the data using a variety of approaches and then the values for sigma and rho are found in the same way from the estimated lambda.
A useful extension of a stationary correlation to a nonstationary covariance is what we term a correlation model. If mean and marginal standard deviation objects are included in the call. Then the observed data is standardized based on these functions. The spatial process is then estimated with respect to the standardized scale. However for predictions and standard errors the mean and standard deviation surfaces are used to produce results in the original scale of the observations.
The GCV function has several alternative definitions when replicate observations are present or if one uses a reduced set knots. Here are the choices based on the method argument:
GCV: leave-one-out GCV. But if there are replicates it is leave one group out. (Wendy and Doug prefer this one.)
GCV.one: Really leave-one-out GCV even if there are replicate points. This what the old tps function used in FUNFITS. rmse: Match the estimate of sigma**2 to a external value ( called rmse)
pure error: Match the estimate of sigma**2 to the estimate based on replicated data (pure error estimate in ANOVA language).
GCV.model: Only considers the residual sums of squares explained by the basis functions.
REML: The process and errors are assumed to the Gaussian and the likelihood is concentrated (or profiled) with respect to lambda. The MLE of lambda is found from this criterion. Restricted means that the likelihood is formed from a linear transformation of the observations that is orthogonal to the column space of P(x).
WARNING: The covariance functions often have a nonlinear parameter(s) that often control the strength of the correlations as a function of separation, usually referred to as the range parameter. This parameter must be specified in the call to Krig and will not be estimated.
# a 2-d example
# fitting a surface to ozone
# measurements. Exponential covariance, range parameter is 20 (in miles)
fit <- Krig(ozone$x, ozone$y, theta=20)
summary( fit) # summary of fit
set.panel( 2,2)
plot(fit) # four diagnostic plots of fit
set.panel()
surface( fit, type="C") # look at the surface
# predict at data
predict( fit)
# predict using 7.5 effective degrees of freedom:
predict( fit, df=7.5)
# predict on a grid ( grid chosen here by defaults)
out<- predict.surface( fit)
surface( out, type="C") # option "C" our favorite
# predict at arbitrary points (10,-10) and (20, 15)
xnew<- rbind( c( 10, -10), c( 20, 15))
predict( fit, xnew)
# standard errors of prediction based on covariance model.
predict.se( fit, xnew)
# surface of standard errors on a default grid
predict.surface.se( fit)-> out.p # this takes some time!
surface( out.p, type="C")
points( fit$x)
# Using another stationary covariance.
# smoothness is the shape parameter for the Matern.
fit <- Krig(ozone$x, ozone$y, Covariance="Matern", theta=10, smoothness=1.0)
summary( fit)
#
# Roll your own: creating very simple user defined Gaussian covariance
#
test.cov <- function(x1,x2,theta,marginal=FALSE,C=NA){
# return marginal variance
if( marginal) { return(rep( 1, nrow( x1)))}
# find cross covariance matrix
temp<- exp(-(rdist(x1,x2)/theta)**2)
if( is.na(C[1])){
return( temp)}
else{
return( temp%*%C)}
}
#
# use this and put in quadratic polynomial fixed function
fit.flame<- Krig(flame$x, flame$y, cov.function="test.cov", m=3, theta=.5)
#
# note how range parameter is passed to Krig.
# BTW: GCV indicates an interpolating model (nugget variance is zero)
# This is the content of the warning message.
# take a look ...
surface(fit.flame, type="I")
#
# Thin plate spline fit to ozone data using the radial
# basis function as a generalized covariance function
#
# p=2 is the power in the radial basis function (with a log term added for
# even dimensions)
# If m is the degree of derivative in penalty then p=2m-d
# where d is the dimension of x. p must be greater than 0.
# In the example below p = 2*2 - 2 = 2
#
out<- Krig( ozone$x, ozone$y,cov.function="Rad.cov",
m=2,p=2,scale.type="range")
# See also the Fields function Tps
# out should be identical to Tps( ozone$x, ozone$y)
#
# A Knot example
data(ozone2)
y16<- ozone2$y[16,]
# there are some missing values -- remove them
good<- !is.na( y16)
y<- y16[good]
x<- ozone2$lon.lat[ good,]
#
# the knots can be arbitrary but just for fun find them with a space
# filling design. Here we select 50 from the full set of 147 points
#
xknots<- cover.design( x, 50, num.nn= 75)$design # select 50 knot points
out<- Krig( x, y, knots=xknots, cov.function="Exp.cov", theta=300)
summary( out)
# note that that trA found by GCV is around 17 so 50>17 knots may be a
# reasonable approximation to the full estimator.
#
# the plot
surface( out, type="C")
US( add=TRUE)
points( x, col=2)
points( xknots, cex=2, pch="O")
## A quick way to deal with too much data if you intend to smooth it anyway
## Discretize the locations to a grid, then apply Krig
## to the discretized locations:
##
CO2.approx<- as.image(RMprecip$y, x=RMprecip$x, nx=20, ny=20)
# take a look:
image.plot( CO2.approx)
# discretized data (observations averaged if in the same grid box)
# 336 locations -- down form the full 806
# convert the image format to locations, obs and weight vectors
yd<- CO2.approx$z[CO2.approx$ind]
weights<- CO2.approx$weights[CO2.approx$ind] # takes into account averaging
xd<- CO2.approx$xd
obj<- Krig( xd, yd, weights=weights, theta=4)
# compare to the full fit:
# Krig( RMprecip$x, RMprecip$y, theta=4)
# A correlation model example
# fit krig surface using a mean and sd function to standardize
# first get stats from 1987 summer Midwest O3 data set
# Compare the function Tps to the call to Krig given above
# fit tps surfaces to the mean and sd points.
# (a shortcut is being taken here just using the lon/lat coordinates)
data(ozone2)
stats.o3<- stats( ozone2$y)
mean.o3<- Tps( ozone2$lon.lat, c( stats.o3[2,]))
sd.o3<- Tps( ozone2$lon.lat, c( stats.o3[3,]))
#
# Now use these to fit particular day ( day 16)
# and use great circle distance
#NOTE: there are some missing values for day 16.
fit<- Krig( ozone2$lon.lat, y16,
theta=350, mean.obj=mean.o3, sd.obj=sd.o3,
Covariance="Matern", Distance="rdist.earth",
smoothness=1.0,
na.rm=TRUE) #
# the finale
surface( fit, type="I")
US( add=TRUE)
points( fit$x)
title("Estimated ozone surface")
#
#
# explore some different values for the range and lambda using REML
theta <- seq( 300,400,,10)
PLL<- matrix( NA, 10,80)
# the loop
for( k in 1:10){
# call to Krig with different ranges
# also turn off warnings for GCV search
# to avoid lots of messages. (not recommended in general!)
PLL[k,]<- Krig( ozone2$lon.lat[good,], y16[good],
cov.function="stationary.cov",
theta=theta[k], mean.obj=mean.o3, sd.obj=sd.o3,
Covariance="Matern",smoothness=.5,
Distance="rdist.earth", nstep.cv=80,
give.warnings=FALSE)$gcv.grid[,7]
#
# gcv.grid is the grid search output from
# the optimization for estimating different estimates for lambda including
# REML
# default grid is equally spaced in eff.df scale ( and should the same across theta)
# here
}
# see the 2 column of $gcv.grid to get the effective degress of freedom.
cat( "all done", fill=TRUE)
contour( theta, 1:80, PLL)
Run the code above in your browser using DataLab