mat: Modern Analogue Technique transfer function models

Description

Modern Analogue Technique (MAT) transfer function models for palaeoecology. The fitted values are the, possibly weighted, averages of the environment for the k-closest modern analogues. MAT is a k-NN method.

Usage

mat(x, ...)
## S3 method for class 'default':
mat(x, y,
    method = c("euclidean", "SQeuclidean", "chord", "SQchord",
               "bray", "chi.square", "SQchi.square",
               "information", "chi.distance", "manhattan",
               "kendall", "gower", "alt.gower", "mixed"),
    ...)

Arguments

a data frame containing the training set data, usually species data.

a vector containing the response variable, usually environmental data to be predicted from x.

method

a character string indicating the dissimilarity (distance) coefficient to be used to define modern analogues. See Details, below.

...

arguments to or from other methods.

Value

Returns an object of class mat with the following components:
standardlist; the model statistics based on simple averages of k-closest analogues. See below.
weightedlist; the model statistics based on weighted of k-closest analogues. See below.
Dijmatrix of pairwise sample dissimilarities for the training set x.
orig.xthe original training set data.
orig.ythe original environmental data or response, y.
callthe matched function call.
methodthe dissimilarity coefficient used.

Details

The Modern Analogue Technique (MAT) is perhaps the simplest of the transfer function models used in palaeoecology. An estimate of the environment, $x$, for the response for a fossil sample, $y$, is the, possibly weighted, mean of that variable across the k-closest modern analogues selected from a modern training set of samples. If used, weights are the reciprocal of the dissimilarity between the fossil sample and each modern analogue.

Pairwise sample dissimilarity is defined by dissimilarity or distance coefficients. A variety of coefficients are supported --- see distance for details of the supported coefficients.

k is chosen by the user. The simplest choice for k is to evaluate the RMSE of the difference between the predicted and observed values of the environmental variable of interest for the training set samples for a sequence of models with increasing k. The number of analogues chosen is the value of k that has lowest RMSE. However, it should be noted that this value is biased as the data used to build the model are also used to test the predictive power.

An alternative approach is to employ an optimisation data set on which to evaluate the size of $k$ that provides the lowest RMSEP. This may be impractical with smaller sample sizes.

A third option is to bootstrap re-sample the training set many times. At each bootstrap sample, predictions for samples in the bootstrap test set can be made for $k = 1, ..., n$, where $n$ is the number of samples in the training set. $k$ can be chosen from the model with the lowest RMSEP. See function bootstrap for further details on choosing $k$.

The output from summary.mat can be used to choose $k$ in the first case above. For predictions on an optimsation or test set see predict.mat. For bootstrap resampling of mat models, see bootstrap.

References

Gavin, D.G., Oswald, W.W., Wahl, E.R. and Williams, J.W. (2003) A statistical approach to evaluating distance metrics and analog assignments for pollen records. Quaternary Research 60, 356--367. Overpeck, J.T., Webb III, T. and Prentice I.C. (1985) Quantitative interpretation of fossil pollen spectra: dissimilarity coefficients and the method of modern analogues. Quaternary Research 23, 87--108.

Prell, W.L. (1985) The stability of low-latitude sea-surface temperatures: an evaluation of the CLIMAP reconstruction with emphasis on the positive SST anomalies, Report TR 025. U.S. Department of Energy, Washington, D.C. Sawada, M., Viau, A.E., Vettoretti, G., Peltier, W.R. and Gajewski, K. (2004) Comparison of North-American pollen-based temperature and global lake-status with CCCma AGCM2 output at 6 ka. Quaternary Science Reviews 23, 87--108.

Examples

Run this code

## continue the RLGH example from ?join
example(join)

## fit the MAT model using the squared chord distance measure
swap.mat <- mat(swapdiat, swappH, method = "SQchord")
swap.mat

## model summary
summary(swap.mat)

## fitted values
fitted(swap.mat)

## model residuals
resid(swap.mat)

## draw summary plots of the model
par(mfrow = c(2,2))
plot(swap.mat)
par(mfrow = c(1,1))

## reconstruct for the RLGH core data
rlgh.mat <- predict(swap.mat, rlgh, k = 10)
rlgh.mat
summary(rlgh.mat)
rlgh.Wmat <- predict(swap.mat, rlgh, k = 10, weighted  = TRUE)
rlgh.Wmat
summary(rlgh.Wmat)

## plot of pH change in the RLGH
depths <- as.numeric(colnames(rlgh.mat$predictions$apparent$predicted))
n.analogues <- rlgh.mat$predictions$apparent$k
plot(rlgh.mat$predictions$apparent$predicted[n.analogues, ], depths,
     ylim = rev(range(depths)),
     xlab = "pH",
     ylab = "Depth (cm)",
     main = "Estimated pH",
     type = "l")

Run the code above in your browser using DataLab