rgr (version 1.1.15)

gx.robmva: Function to undertake a Robust Exploratory Multivariate Data Analysis


The function carries out a robust Principal Components Analysis (PCA) and estimates the Mahalanobis distances for a non-compositional dataset and places them in an object to be saved and post-processed for display and further manipulation. For closed compositional, geochemical, data use function gx.robmva.closed. Robust procedures are used, ‘MCD’, ‘MVE’ or user supplied weights, for classical procedures see gx.mva. For results display see gx.rqpca.screeplot, gx.rqpca.loadplot, gx.rqpca.plot, gx.rqpca.print, gx.md.plot and gx.md.print. For Kaiser varimax rotation see gx.rotate.


gx.robmva(xx, proc = "mcd", wts = NULL, main = deparse(substitute(xx)))



a n by p data matrix to be processed.


by default proc = "mcd" for the Minimimum Covariance Determinant (MCD) robust procedure. Setting proc = "mve" results in the Minimum Volume Ellipsoid (MVE) procedure being used. If p > 50 the MVE procedure is used. See wts below.


by default wts = NULL and the MCD or MVE estimation procedures will be used. If, however, a vector of n zero or 1 weights are supplied these will be used for robust estimation and the value of proc ignored.


by default the name of the object xx, main = deparse(substitute(xx)), it may be replaced by the user, but this is not recommended, see Details below.


The following are returned as an object to be saved for subsequent display, etc.:


by default (recommended) the input data matrix name.


the data matrix name, input = deparse(substitute(xx)), retained to be used by post-processing display functions.


the robust procedure used, the value of proc will be "mcd", "mve" or "wts".


the total number of individuals (observations, cases or samples) in the input data matrix.


the number of individuals remaining in the ‘core’ data subset following the robust estimation, i.e. the sum of those individuals with wts = 1.


the number of variables on which the multivariate operations were based.


flag for gx.md.plot, set to FALSE.


the row numbers or identifiers and column headings of the input matrix.


the vector of weights for the n individuals arising from the robust estimation of the covariance matrix and means.


the length p vector the weighted means for the variables.


the p by p weighted covariance matrix for the n by p data matrix.


the length p vector of weighted standard deviations for the variables.


the n by p matrix of weighted standard normal deviates.


the p by p matrix of weighted Pearson product moment correlation coefficients.


the vector of p eigenvalues of the scaled Pearson robust correlation matrix for RQ analysis, see Grunsky (2001).


the vector of p robustly estimated eigenvalues each expressed as a percentage of the sum of the eigenvalues.


the n by p matrix of robustly estimated eigenvectors.


the p by p matrix of robust Principal Component (PC) loadings.


the p by p matrix containing the percentages of the variability of each variable (rows) expressed in each robust PC (columns).


then by p matrix of the n individuals scores on the p robust PCs.


a vector of p variances of the columns of rqscore.


the vector of p variances of the columns of rqscore expressed as percentages. This is a check on vector econtrib, the values should be identical for a classical PCA. However, for robust PCAs this is not so as the trimmed individuals from the robust estimation have been re-introduced. As a consequence pvcontrib can be very different from econtrib. The plotting of PCs containing high proportions of the variance in robust PCAs can be useful for identifying outliers


the vector of p cumulative sums of pvcontrib, see above.


the vector of n robust Mahalanobis distances (MDs) for the n by p input matrix.


the vector ofn robust predicted probabilities of population membership, see Garrett (1990).


the vector of n robust empirical Chi-square probabilities for the MDs.


the number of PCs that have been rotated. At this stage of a data analysis nr = NULL in order to control PC plot axis labelling.


If main is undefined the name of the matrix object passed to the function is used to identify the object. This is the recommended procedure as it helps to track the progression of a data analysis. Alternate plot titles are best defined when the saved object is passed to gx.rqpca.screeplot, gx.rqpca.loadplot, gx.rqpca.plot or gx.md.plot for display. If no plot title is required set main = " ", or if a user defined plot title is required it may be defined, e.g., main = "Plot Title Text".

The variances of the robust Principal Component scores are displayed, in a non-robust PCA these decrease with increasing component rank. However, in a robust PCA this may not be the case, and lower-order scores with high variances are often worthy of further inspection.


Garrett, R.G., 1990. A robust multivariate allocation procedure with applications to geochemical data. In Proc. Colloquium on Statistical Applications in the Earth Sciences (Eds F.P. Agterberg & G.F. Bonham-Carter). Geological Survey of Canada Paper 89-9, pp. 309-318.

Garrett, R.G., 1993. Another cry from the heart. Explore - Assoc. Exploration Geochemists Newsletter, 81:9-14.

Grunsky, E.C., 2001. A program for computing RQ-mode principal components analysis for S-Plus and R. Computers & Geosciences, 27(2):229-235.

Reimann, C., Filzmoser, P., Garrett, R. and Dutter, R., 2008. Statistical Data Analysis Explained: Applied Environmental Statistics with R. John Wiley & Sons, Ltd., 362 p.

See Also

ltdl.fix.df, remove.na, na.omit, gx.rqpca.screeplot, gx.rqpca.loadplot, gx.rqpca.plot, gx.rqpca.print, gx.md.plot, gx.md.print, gx.robmva.closed, gx.rotate


Run this code
## Generate a population of synthetic bivariate normal data
grp1 <- mvrnorm(100, mu = c(40, 30), Sigma = matrix(c(6, 3, 3, 2), 2, 2))
grp1 <- cbind(grp1, rep(1, 100))
## Generate a set of six (6) outliers
anom <- matrix(c(43, 34, 50, 37, 47, 30, 27, 29, 35, 33, 32, 25),6, 2)
anom <- cbind(anom, rep(2, 6))
## Bind the test data together and display the test data 
test.robmva.mat <- rbind(grp1, anom)
test.robmva <- as.data.frame(test.robmva.mat)
dimnames(test.robmva)[[2]] <- c("x","y","grp")
xyplot.tags(x, y, dimnames(test.robmva)[[1]], cex = 0.75)

## Generate gx.robmva saved object 
save.rob <- gx.robmva(as.matrix(test.robmva[, c(1:2)]))
## Display saved object with alternate main titles
gx.rqpca.screeplot(save.rob, main = "Bivariate synthetic data")
gx.rqpca.plot(save.rob, cex.lab = 0.8, rowids = TRUE,
main = "Bivariate synthetic data")
gx.md.plot(save.rob, cex.main = 0.9, cex.lab = 0.8, cex.axis = 0.8,
main = "Bivariate synthetic data")
gx.md.print(save.rob, pcut = 0.05)

## Clean-up and detach test data
# }

Run the code above in your browser using DataLab