Learn R Programming

rgr (version 1.1.0)

gx.md.gait: Function for Multivariate Graphical Adaptive Interactive Trimming

Description

Function to undertake the GAIT (Graphical Adaptive Interactive Trimming) procedure for multivariate distributions through Chi-square plots of Mahalanobis distances (MDs) as described in Garrett (1988). To carry out GAIT the function is called repeatedly with the weights from the previous iteration being used as a starting point. Either a percentage based MVT or a MCD robust start may be used as the first iteration.

Usage

gx.md.gait(xx, wts = NULL, trim = -1, mvtstart = FALSE,
	mcdstart = FALSE, main = deparse(substitute(xx)),
	ifadd = c(0.98, 0.95, 0.9), cexf = 0.6, cex = 0.8, ...)

Arguments

xx
the n by p matrix for which the Mahalanobis distances are required.
wts
the vector of weights for the n individuals, either 1 or 0.
trim
the desired trim: trim < 0 - no trim, the default; trim >0 & <1 -="" fraction="" of="" individuals="" to="" be="" trimmed;="" trim="">= 1 - the number of individuals with the highest MDs from the previous iteration to trim.
mvtstart
set mvtstart = TRUE for a percentage based MVT (multivariate trim) start.
mcdstart
set mcdstart = TRUE for a minimum covariance determinant (mcd) robust start.
main
an alternative plot title to the default input data matrix name, see Details below.
ifadd
if probability based fences are to be displayed on the Chi-square plots enter the probabilities here, see Details below. For no fences set ifadd = NULL.
cexf
the scale expansion factor for the Ch-square fence annotation, by default cexf = 0.6.
cex
the scale expansion factor for the symbols and text annotation within the frame of the Chi-square plot, by default cex = 0.8.
...
further arguments to be passed to methods concerning the generated plots. For example, if some colour other than black is required for the plotting characters, specify col = 2 to obtain red (see disp

Value

  • The following are returned as an object to be saved for the next iteration or final use:
  • mainby default (recommended) the input data matrix name.
  • inputthe data matrix name, input = deparse(substitute(xx)), retained to be used by post-processing display functions.
  • procthe procedure followed for this iteration, used for subsequent Chi-sqaure plot x-axis labelling.
  • wtsthe vector of weights for the n individuals, either 1 or 0.
  • nthe total number of individuals (observations, cases or samples) in the input data matrix.
  • ptrimthe percentage, as a fraction, of samples called to be trimmed in this iteration, otherwise ptrim = -1.
  • meanthe vector of means for the core data following the current GAIT step.
  • covthe covariance matrix for the core data following the current GAIT step.
  • sdthe vector of standard deviations for the core data following the current GAIT step.
  • mdthe vector of Mahalanobis distances for all the n individuals following the current GAIT step.
  • ppmthe vector of predicted probabilities of membership for all the n individuals following the current GAIT step.

Details

If main is undefined the name of the matrix object passed to the function is used as the plot title. This is the recommended procedure as it helps to track the progression of the GAIT. Alternate plot titles can be defined if the final saved object is passed to gx.md.plot. If no plot title is required set main = "", or if a user defined plot title is required it may be defined, e.g., main = "Plot Title Text". By default three fences are placed on the Chi-square plots at probabilities of membership of the current core data subset, or total data if appropriate, with ifadd = c(0.98, 0.95, 0.9). Alternate probabilities may be defined as best for the display. If no fences are required set ifadd = NULL. The Mahalanobis distance, Chi-square, plot x-axis label is set appropriately to indicated the type of robust start or trim using the value of proc.

References

Garrett, R.G., 1988. IDEAS - An interactive computer graphics tool to assist the exploration geochemist. In Current Research Part F, Geological Survey of Canada Paper 88-1F, pp. 1-13. Garrett, R.G., 1993. Another cry from the heart. Explore - Assoc. Exploration Geochemists Newsletter, 81:9-14. Garrett, R.G., 1989. The Chi-square plot - a tool for multivariate outlier recognition. In Proc. 12th International Geochemical Exploration Symposium, Geochemical Explotaion 1987 (Ed. S. Jenness). Journal of Geochemical Exploration, 32(1/3):319-341.

See Also

ltdl.fix.df, remove.na, gx.md.plot, gx.md.print

Examples

Run this code
## Make test data available
data(sind)
attach(sind)
sind.mat <- as.matrix(sind[, -c(1:3)])
## Ensure all data are in the same units (mg/kg)
sind.mat2open <- sind.mat
sind.mat2open[, 2] <- sind.mat2open[, 2] * 10000

## To multivariate trim as in IDEAS, see JGE (1989) 32(1-3):319-341, execute:
gx.md.gait(sind.mat)
sind.gait.1 <- gx.md.gait(sind.mat, trim = 0.24, ifadd = 0.98) 
sind.gait.2 <- gx.md.gait(sind.mat, wts = sind.gait.1$wts, mvtstart = TRUE,
trim = 4, ifadd = 0.98)
sind.gait.3 <- gx.md.gait(sind.mat, wts = sind.gait.2$wts, trim = 1,
ifadd = 0.9)
sind.gait.4 <- gx.md.gait(sind.mat, wts = sind.gait.3$wts, trim = 2,
ifadd = 0.9)

## To multivariate trim with a mcd start and an ilr transformation for closure:
gx.md.gait(ilr(sind.mat2open),ifadd = 0.95)
sind.gait.1 <- gx.md.gait(ilr(sind.mat2open), mcdstart = TRUE, ifadd = NULL)
sind.gait.2 <- gx.md.gait(ilr(sind.mat2open), wts = sind.gait.1$wts,
mvtstart = TRUE, trim = 3, ifadd = 0.9)
sind.gait.3 <- gx.md.gait(ilr(sind.mat2open), wts = sind.gait.2$wts, trim = 1,
ifadd = 0.9)

## Display saved objects with alternate main titles and list outliers
## IDEAS procedure
gx.md.plot(sind.gait.4,
main = "Howarth & Sinding-Larsen
Stream Sediments, IDEAS procedure",
cex.main = 0.8, ifadd = 0.9)
gx.md.print(cbind(sind.gait.4$md, sind.gait.4$ppm, ID, Zn, Cu, Cd, Fe, Mn),
pcut = 0.2)
## mcd robust start and ilr transformation
gx.md.plot(sind.gait.3,
main = "Howarth & Sinding-Larsen
Stream Sediments, ilr Transformed Data",
cex.main = 0.8)
gx.md.print(cbind(sind.gait.3$md, sind.gait.3$ppm, ID, Zn, Cu, Cd, Fe, Mn),
pcut = 0.2)

## Clean-up and detach test data
rm(sind.mat)
rm(sind.mat2open)
rm(sind.gait.1)
rm(sind.gait.2)
rm(sind.gait.3)
rm(sind.gait.4)
detach(sind)

Run the code above in your browser using DataLab