MAT: Palaeoenvironmental reconstruction using the Modern Analogue Technique (MAT)

Description

Functions for reconstructing (predicting) environmental values from biological assemblages using the Modern Analogue Technique (MAT), also know as k nearest neighbours (k-NN).

Usage

MAT(y, x, dist.method="sq.chord", k=5, lean=TRUE)

## S3 method for class 'MAT':
predict(object, newdata=NULL, k=object$k, sse=FALSE, 
        nboot=100, match.data=TRUE, verbose=TRUE, lean=TRUE, 
        \dots)

## S3 method for class 'MAT':
performance(object, \dots)

## S3 method for class 'MAT':
crossval(object, k=object$k, cv.method="lgo", 
        verbose=TRUE, ngroups=10, nboot=100, h.cutoff=0, h.dist=NULL, \dots)

## S3 method for class 'MAT':
print(x, \dots)

## S3 method for class 'MAT':
summary(object, full=FALSE, \dots)

## S3 method for class 'MAT':
plot(x, resid=FALSE, xval=FALSE, k=5, wMean=FALSE, xlab="", 
      ylab="", ylim=NULL, xlim=NULL, add.ref=TRUE,
      add.smooth=FALSE, \dots)

## S3 method for class 'MAT':
residuals(object, cv=FALSE, \dots)

## S3 method for class 'MAT':
fitted(object, \dots)

## S3 method for class 'MAT':
screeplot(x, \dots)

paldist(y, dist.method="sq.chord")

paldist2(y1, y2, dist.method="sq.chord")

Arguments

y, y1, y2

data frame containing biological data.

newdata

data frame containing biological data to predict from.

a vector of environmental values to be modelled, matched to y.

dist.method

dissimilarity coefficient. See details for options.

match.data

logical indicate the function will match two species datasets by their column names. You should only set this to FALSE if you are sure the column names match exactly.

number of analogues to use.

lean

logical to remove items form the output.

object

an object of class MAT.

resid

logical to plot residuals instead of fitted values.

xval

logical to plot cross-validation estimates.

wMean

logical to plot weighted-mean estimates.

xlab, ylab, xlim, ylim

additional graphical arguments to plot.wa.

add.ref

add 1:1 line on plot.

add.smooth

add loess smooth to plot.

cv.method

cross-validation method, either "lgo" or "bootstrap".

verbose

logical to show feedback during cross-validaton.

nboot

number of bootstrap samples.

ngroups

number of groups in leave-group-out cross-validation, or a vector contain leave-out group menbership.

h.cutoff

cutoff for h-block cross-validation. Only training samples greater than h.cutoff from each test sample will be used.

h.dist

distance matrix for use in h-block cross-validation. Usually a matrix of geographical distances between samples.

sse

logical indicating that sample specific errors should be calculated.

full

logical to indicate a full or abbreviated summary.

logical to indicate model or cross-validation residuals.

...

additional arguments.

Value

Function MAT returns an object of class MAT which contains the following items:
calloriginal function call to MAT.
fitted.valesfitted (predicted) values for the training set, as the mean and weighted mean (weighed by dissimilarity) of the k closest analogues.
diagnosticsstandard deviation of the k analogues and dissimilarity of the closest analogue.
dist.ndissimilarities of the k closest analogues.
x.nenvironmental values of the k closest analogues.
match.namecolumn names of the k closest analogues.
xenvironmental variable used in the model.
dist.methoddissimilarity coefficient.
knumber of closest analogues to use.
yoriginal species data.
cv.summarysummary of the cross-validation (not yet implemented).
distdissimilarity matrix (returned if lean=FALSE).
If function predict is called with newdata=NULL it returns a matrix of fitted values from the original training set analysis. If newdata is not NULL it returns list with the following named elements:
fitpredictions for newdata.
diagnosticsstandard deviations of the k closest analogues and distance of closest analogue.
dist.ndissimilarities of the k closest analogues.
x.nenvironmental values of the k closest analogues.
match.namecolumn names of the k closest analogues.
distdissimilarity matrix (returned if lean=FALSE).
If sample specific errors were requested the list will also include:
fit.bootmean of the bootstrap estimates of newdata.
v1squared standard error of the bootstrap estimates for each new sample.
v2mean squared error for the training set samples, across all bootstram samples.
SEPstandard error of prediction, calculated as the square root of v1 + v2.
Functions paldist and paldist2 return dissimilarity matrices. performance returns a matrix of performance statistics for the MAT model, with columns for RMSE, R2, mean and max bias for each number of analogues up to k. See performance for a description of the output.

Details

MAT performs an environmental reconstruction using the modern analogue technique. Function MAT takes a training dataset of biological data (species abundances) y and a single associated environmental variable x, and generates a model of closest analogues, or matches, for the modern data data using one of a number of dissimilarity coefficients. Options for the latter are: "euclidean", "sq.euclidean", "chord", "sq.chord", "chord.t", "sq.chord.t", "chi.squared", "sq.chi.squared", "bray". "chord.t" are true chord distances, "chord" refers to the the variant of chord distance using in palaeoecology (e.g. Overpeck et al. 1985), which is actually Hellinger's distance (Legendre & Gallagher 2001). There are various help functions to plot and extract information from the results of a MAT transfer function. The function predict takes MAT object and uses it to predict environmental values for a new set of species data, or returns the fitted (predicted) values from the original modern dataset if newdata is NULL. Variables are matched between training and newdata by column name (if match.data is TRUE). Use compare.datasets to assess conformity of two species datasets and identify possible no-analogue samples. MAT has methods fitted and rediduals that return the fitted values (estimates) and residuals for the training set, performance, which returns summary performance statistics (see below), and print and summary to summarise the output. MAT also has a plot method that produces scatter plots of predicted vs observed measurements for the training set. Function screeplot displays the RMSE of prediction for the training set as a function of the number of analogues (k) and is useful for estimating the optimal value of k for use in prediction. paldist and paldist1 are helper functions though they may be called directly. paldist takes a single data frame or matrix returns a distance matrix of the row-wise dissimilarities. paldist2 takes two data frames of matrices and returns a matrix of all row-wise dissimilarities between the two datasets.

References

Legendre, P. & Gallagher, E. (2001) Ecologically meaningful transformations for ordination of species. Oecologia, 129, 271-280. Overpeck, J.T., Webb, T., III, & Prentice, I.C. (1985) Quantitative interpretation of fossil pollen spectra: dissimilarity coefficients and the method of modern analogs. Quaternary Research, 23, 87-108.

Examples

Run this code

# pH reconstruction of the RLGH, Scotland, using SWAP training set 
# shows recent acidification history
data(SWAP)
data(RLGH)
fit <- MAT(SWAP$spec, SWAP$pH, k=20)  # generate results for k 1-20
#examine performance
performance(fit)
print(fit)
# How many analogues?
screeplot(fit)
# do the reconstruction
pred.mat <- predict(fit, RLGH$spec, k=10)
# plot the reconstruction
plot(RLGH$depths$Age, pred.mat$fit[, 1], type="b", ylab="pH", xlab="Age")

#compare to a weighted average model
fit <- WA(SWAP$spec, SWAP$pH)
pred.wa <- predict(fit, RLGH$spec)
points(RLGH$depths$Age, pred.wa$fit[, 1], col="red", type="b")
legend("topleft", c("MAT", "WA"), lty=1, col=c("black", "red"))

Run the code above in your browser using DataLab