fsdaR (version 0.4-9)

smult: Computes S estimators in multivariate analysis

Description

Computes S estimators in multivariate analysis

Usage

smult(x, monitoring = FALSE, plot = FALSE, bdp, nsamp,
  conflev = 0.975, nocheck = FALSE, trace = FALSE, ...)

Arguments

x

An n x p data matrix (n observations and p variables). Rows of x represent observations, and columns represent variables.

Missing values (NA's) and infinite values (Inf's) are allowed, since observations (rows) with missing or infinite values will automatically be excluded from the computations.

monitoring

Wheather to perform monitoring of Mahalanobis distances and other specific quantities

plot

Plots the Mahalanobis distances against index number. If plot=FALSE (default) or plot=0 no plot is produced. The confidence level used to draw the confidence bands for the MD is given by the input option conflev. If conflev is not specified a nominal 0.975 confidence interval will be used. If plot=2 a scatter plot matrix with the outliers highlighted is produced. If plot is a list it may contain the following fields:

  • labeladd If labeladd=1, the outliers in the spm are labelled with the unit row index. The default value is labeladd="", i.e. no label is added

  • nameY character vector containing the labels of the variables. As default value, the labels which are added are Y1, ...Yp.

bdp

Measures the fraction of outliers the algorithm should resist. In this case any value greater than 0 but smaller or equal than 0.5 will do fine (default is bdp=0.5). Note that given bdp nominal efficiency is automatically determined.

nsamp

Number of subsamples which will be extracted to find the robust estimator. If nsamp=0 all subsets will be extracted. They will be (n choose p). If the number of all possible subset is <1000 the default is to extract all subsets otherwise just 1000.

conflev

confidence interval for the horizontal bands. It can be a vector of different confidence level values, e.g. c(0.95, 0.99, 0.999). The confidence interval is based on the chi^2 distribution.

nocheck

It controls whether to perform checks on matrix Y. If nocheck=TRUE, no check is performed.

trace

Whether to print intermediate results. Default is trace=FALSE.

...

potential further arguments passed to lower level functions.

Value

Depending on the input parameter monitoring, one of the following objects will be returned:

  1. smult.object

  2. smulteda.object

Details

This function follows the lines of MATLAB/R code developed during the years by many authors. For more details see http://www.econ.kuleuven.be/public/NDBAE06/programs/ and the R package CovSest The core of these routines, e.g. the resampling approach, however, has been completely redesigned, with considerable increase of the computational performance.

References

Maronna, R.A., Martin D. and Yohai V.J. (2006), Robust Statistics, Theory and Methods, Wiley, New York.

Examples

Run this code
# NOT RUN {
 
# }
# NOT RUN {
 data(hbk)
 (out <- smult(hbk[,1:3]))
 class(out)
 summary(out)

 ##  Generate contaminated data (200,3)
 n <- 200
 p <- 3
 set.seed(123456)
 X <- matrix(rnorm(n*p), nrow=n)
 Xcont <- X
 Xcont[1:5, ] <- Xcont[1:5,] + 3

 out1 <- smult(Xcont, trace=TRUE)           # no plots (plot defaults to FALSE)
 names(out1)

 ## plot=TRUE - generates: (1) a plot of Mahalanobis distances against
 ##    index number. The confidence level used to draw the confidence bands for
 ##    the MD is given by the input option conflev. If conflev is
 ##    not specified a nominal 0.975 confidence interval will be used and
 ##    (2) a scatter plot matrix with the outliers highlighted.

 (out1 <- smult(Xcont, trace=TRUE, plot=TRUE))

 ## plots is a list: the spm shows the labels of the outliers.
 (out1 <- smult(Xcont, trace=TRUE, plot=list(labeladd="1")))

 ## plots is a list: the spm uses the variable names provided by 'nameY'.
 (out1 <- smult(Xcont, trace=TRUE, plot=list(nameY=c("A", "B", "C"))))

 ## smult() with monitoring
 (out2 <- smult(Xcont, monitoring=TRUE, trace=TRUE))
 names(out2)

 ##  Forgery Swiss banknotes examples.

 data(swissbanknotes)

 (out1 <- smult(swissbanknotes[101:200,], plot=TRUE))

 (out1 <- smult(swissbanknotes[101:200,], plot=list(labeladd="1")))
 
# }

Run the code above in your browser using DataCamp Workspace