Learn R Programming

rgr (version 1.1.0)

gx.mva: Function to undertake an Exploratory Multivariate Data Analysis

Description

The function carries out a Principal Components Analysis (PCA) and estimates the Mahalanobis distances for a dataset and places them in an object to be saved and post-processed for display and further manipulation. Classical procedures are used, for robust procedures see gx.robmva. For results display see gx.rqpca.screeplot, gx.rqpca.plot, gx.rotate, gx.md.plot and gx.md.print.

Usage

gx.mva(xx, main = deparse(substitute(xx)))

Arguments

xx
a n by p data matrix to be processed.
main
by default the name of the object xx, main = deparse(substitute(xx)), it may be replaced by the user, but this is not recommended, see Details below.

Value

  • The following are returned as an object to be saved for subsequent display, etc.:
  • mainby default (recommended) the input data matrix name.
  • inputthe data matrix name, input = deparse(substitute(xx)), retained to be used by post-processing display functions.
  • procthe procedure used, by default proc = "cov" to indicate a classical covariance matrix.
  • nthe total number of individuals (observations, cases or samples) in the input data matrix.
  • ncthe number of individuals remaining in the core data subset after trimming. At this stage of a data analysis nc = n.
  • pthe number of variables on which the multivariate operations were based.
  • ifilrflag for gx.md.plot, set to FALSE.
  • matnamesthe row numbers and column headings of the input matrix.
  • wtsthe vector of weights for the n individuals used to compute the covariance matrix and means. At this stage of the data analysis all weights are set to 1.
  • meanthe vector the weighted means for the p variables.
  • covthe p by p weighted covariance matrix for the n by p data matrix.
  • sdthe vector of weighted standard deviations for the p variables.
  • sndthe n by p matrix of weighted standard normal deviates.
  • rthe p by p matrix of weighted Pearson product moment correlation coefficients.
  • eigenvaluesthe vector of p eigenvalues of the scaled Pearson correlation matrix for RQ analysis, see Grunsky (2001).
  • econtribthe vector of p eigenvalues each expressed as a percentage of the sum of the eigenvalues.
  • eigenvectorsthe n by p matrix of eigenvectors.
  • rloadthe p by p matrix of Principal Component (PC) loadings.
  • rcrthe p by p matrix containing the percentages of the variability of each variable (columns) expressed in each PC (rows).
  • rqscorethe n by p matrix of the n individuals scores on the p PCs.
  • vcontriba vector of p variances of the columns of rqscore.
  • pvcontribthe vector of p variances of the columns of rqscore expressed as percentages. This is a check on vector econtrib, the values should be identical.
  • cpvcontribthe vector of p cumulative sums of pvcontrib, see above.
  • mdthe vector of n Mahalanobis distances (MDs) for the n by p input matrix.
  • ppmthe vector of n predicted probabilities of population membership, see Garrett (1990).
  • epmthe vector of n empirical Chi-square probabilities for the MDs.
  • nrthe number of PCs that have been rotated. At this stage of a data analysis nr = NULL in order to control PC plot axis labelling.

Details

If main is undefined the name of the matrix object passed to the function is used to identify the object. This is the recommended procedure as it helps to track the progression of a data analysis. Alternate plot titles are best defined when the saved object is passed to gx.rqpca.plot, gx.rqpca.screeplot or gx.md.plot for display. If no plot title is required set main = "", or if a user defined plot title is required it may be defined, e.g., main = "Plot Title Text".

References

Garrett, R.G., 1990. A robust multivariate allocation procedure with applications to geochemical data. In Proc. Colloquium on Statistical Applications in the Earth Sciences (Eds F.P. Agterberg & G.F. Bonham-Carter). Geological Survey of Canada Paper 89-9, pp. 309-318. Garrett, R.G., 1993. Another cry from the heart. Explore - Assoc. Exploration Geochemists Newsletter, 81:9-14. Grunsky, E.C., 2001. A program for computing RQ-mode principal components analysis for S-Plus and R. Computers & Geosciences, 27(2):229-235. Reimann, C., Filzmoser, P., Garrett, R. and Dutter, R., 2008. Statistical Data Analysis Explained: Applied Environmental Statistics with R. John Wiley & Sons, Ltd., 362 p.

See Also

ltdl.fix.df, remove.na, na.omit, gx.rqpca.screeplot, gx.rqpca.plot, gx.rotate, gx.md.plot, gx.md.print, gx.robmva

Examples

Run this code
## Make test data available
data(sind)
attach(sind)
sind.mat <- as.matrix(sind[, -c(1:3)])
## Ensure all data are in the same units (mg/kg)
sind.mat2open <- sind.mat
sind.mat2open[, 2] <- sind.mat2open[, 2] * 10000

## Generate gx.mva object after an clr transform for a PCA
sind.save.clr <- gx.mva(clr(sind.mat2open))
gx.rqpca.screeplot(sind.save.clr)
gx.rqpca.plot(sind.save.clr)
## Display saved object with alternate main titles
gx.rqpca.screeplot(sind.save.clr,
main = "Howarth & Sinding-Larsen
Stream Sediments, clr Transformed Data",
cex.main = 0.8)
gx.rqpca.plot(sind.save.clr,
main = "Howarth & Sinding-Larsen
Stream Sediments, clr Transformed Data",
cex.main = 0.8)

## Generate gx.mva object after an ilr transform for Mahalanobis
## distance estimation
sind.save.ilr <- gx.mva(ilr(sind.mat2open))
gx.md.plot(sind.save.ilr)
## Display saved object with alternate main titles
gx.md.plot(sind.save.ilr,
main = "Howarth & Sinding-Larsen
Stream Sediments, ilr Transformed Data",
cex.main = 0.8)

## Clean-up and detach test data
rm(sind.mat)
rm(sind.mat2open)
rm(sind.save.clr)
rm(sind.save.ilr)
detach(sind)

Run the code above in your browser using DataLab