estimator: Estimates each known datapoint using the others as datapoints

Description

Uses Bayesian techniques to estimate a model's prediction at each of n datapoints. To estimate the \(i^{\rm th}\) point, conditioning variables of \(1,\ldots, i-1\) and \(i+1,\ldots, n\) inclusive are used (ie, all points except point \(i\)).

This routine is useful when finding optimal coefficients for the correlation using boot methods.

Usage

estimator(val, A, d, scales=NULL, pos.def.matrix=NULL,
func=regressor.basis)

Arguments

val

Design matrix with rows corresponding to points at which the function is known

Correlation matrix (note that this is not the inverse of the correlation matrix)

Vector of observations

scales

Scales to be used to calculate t(x). Note that scales has no default value because estimator() is most often used in the context of assessing the appropriateness of a given value of scales. If the desired distance matrix (called \(B\) in Oakley) is not diagonal, pass this matrix to estimator() via the pos.def.matrix argument.

pos.def.matrix

Positive definite matrix \(B\)

func

Function used to determine basis vectors, defaulting to regressor.basis if not given.

Value

A vector of observations of the same length as d.

Details

Given a matrix of observation points and a vector of observations, estimator() returns a vector of predictions. Each prediction is made in a three step process. For each index \(i\):

Observation d[i] is discarded, and row i and column i deleted from A (giving A[-i,-i]). Thus d and A are the observation vector and correlation matrix that would have been obtained had observation i not been available.
The value of d[i] is estimated on the basis of the shortened observation vector and the comatrix of A.

It is then possible to make a scatterplot of d vs dhat where dhat=estimator(val,A,d). If the scales used are “good”, then the points of this scatterplot will be close to abline(0,1). The third step is to optimize the goodness of fit of this scatterplot.

References

J. Oakley and A. O'Hagan, 2002. Bayesian Inference for the Uncertainty Distribution of Computer Model Outputs, Biometrika 89(4), pp769-784
R. K. S. Hankin 2005. Introducing BACCO, an R bundle for Bayesian analysis of computer code output, Journal of Statistical Software, 14(16)

Examples

Run this code

# NOT RUN {
# example has 40 observations on 6 dimensions.
# function is just sum( (1:6)*x) where x=c(x_1, ... , x_2)

val <- latin.hypercube(40,6)
colnames(val) <- letters[1:6]
d <- apply(val,1,function(x){sum((1:6)*x)})

#pick some scales:
fish <- rep(1,ncol(val))
A <- corr.matrix(val,scales=fish)

#add some suitably correlated noise:
d <- as.vector(rmvnorm(n=1, mean=d, 0.1*A))

# estimate d using the leave-out-one technique in estimator():
d.est <- estimator(val, A, d, scales=fish)

#and plot the result:
lims <- range(c(d,d.est))
par(pty="s")
plot(d, d.est, xaxs="r", yaxs="r", xlim=lims, ylim=lims)
abline(0,1)
  
# }

Run the code above in your browser using DataLab