mat(x, ...)## S3 method for class 'default':
mat(x, y,
method = c("euclidean", "SQeuclidean", "chord", "SQchord",
"bray", "chi.square", "SQchi.square",
"information", "chi.distance", "manhattan",
"kendall", "gower", "alt.gower", "mixed"),
...)
## S3 method for class 'formula':
mat(formula, data, subset, na.action,
method = c("euclidean", "SQeuclidean", "chord", "SQchord",
"bray", "chi.square", "SQchi.square",
"information", "chi.distance", "manhattan",
"kendall", "gower", "alt.gower", "mixed"),
model = FALSE, ...)
x
.as.data.frame
to a data frame) containing
the variables in the model. If not found in data
, the
variables NA
s. The default is set by the
na.action
setting of options
, and is
TRUE
the model frame of the fit is
returned.mat
with the following components:x
.y
.model = TRUE
then additional components "terms"
and
"model"
are returned containing the terms
object
and model frame used. A typical model has the form response ~ terms
where
response
is the (numeric) response data frame and terms
is a series of terms which specifies a linear predictor for
response
. A typical form for terms
is .
,
which is shorthand for "all variables" in data
. If .
is
used, data
must also be provided. If specific species
(variables) are required then terms
should take the form
spp1 + spp2 + spp3
.
Pairwise sample dissimilarity is defined by dissimilarity or
distance coefficients. A variety of coefficients are supported --- see
distance
for details of the supported coefficients.
k is chosen by the user. The simplest choice for k is to evaluate the RMSE of the difference between the predicted and observed values of the environmental variable of interest for the training set samples for a sequence of models with increasing k. The number of analogues chosen is the value of k that has lowest RMSE. However, it should be noted that this value is biased as the data used to build the model are also used to test the predictive power.
An alternative approach is to employ an optimisation data set on which to evaluate the size of $k$ that provides the lowest RMSEP. This may be impractical with smaller sample sizes.
A third option is to bootstrap re-sample the training set many times. At
each bootstrap sample, predictions for samples in the bootstrap test
set can be made for $k = 1, ..., n$, where $n$ is the
number of samples in the training set. $k$ can be chosen from the
model with the lowest RMSEP. See function bootstrap.mat
for
further details on choosing $k$.
The output from summary.mat
can be used to choose
$k$ in the first case above. For predictions on an optimsation or
test set see predict.mat
. For bootstrap resampling of
mat
models, see bootstrap.mat.
Prell, W.L. (1985) The stability of low-latitude sea-surface temperatures: an evaluation of the CLIMAP reconstruction with emphasis on the positive SST anomalies, Report TR 025. U.S. Department of Energy, Washington, D.C. Sawada, M., Viau, A.E., Vettoretti, G., Peltier, W.R. and Gajewski, K. (2004) Comparison of North-American pollen-based temperature and global lake-status with CCCma AGCM2 output at 6 ka. Quaternary Science Reviews 23, 87--108.
summary.mat
, bootstrap.mat
for boostrap
resampling of MAT models, predict.mat
for making
predictions from MAT models, fitted.mat
and
resid.mat
for extraction of fitted values and residuals
from MAT models respectively. plot.mat
provides a
plot.lm
-like plotting tool for MAT models.## continue the RLGH example from ?join
example(join)
## fit the MAT model using the squared chord distance measure
swap.mat <- mat(swapdiat, swappH, method = "SQchord")
swap.mat
## model summary
summary(swap.mat)
## fitted values
fitted(swap.mat)
## model residuals
resid(swap.mat)
## draw summary plots of the model
par(mfrow = c(2,2))
plot(swap.mat)
par(mfrow = c(1,1))
## reconstruct for the RLGH core data
rlgh.mat <- predict(swap.mat, rlgh, k = 10)
rlgh.mat
summary(rlgh.mat)
## draw the reconstruction
reconPlot(rlgh.mat, use.labels = TRUE, display.error = "bars",
xlab = "Depth", ylab = "pH")
Run the code above in your browser using DataLab