rrcovHD (version 0.2-6)

OutlierSign2: Outlier identification in high dimensions using the SIGN2 algorithm

Description

Fast algorithm for identifying multivariate outliers in high-dimensional and/or large datasets, using spatial signs, see Filzmoser, Maronna, and Werner (CSDA, 2007). The computation of the distances is based on principal components.

Usage

OutlierSign2(x, ...)
    # S3 method for default
OutlierSign2(x, grouping, qcrit = 0.975, explvar=0.99, trace=FALSE, …)
    # S3 method for formula
OutlierSign2(formula, data, …, subset, na.action)

Arguments

formula

a formula with no response variable, referring only to numeric variables.

data

an optional data frame (or similar: see model.frame) containing the variables in the formula formula.

subset

an optional vector used to select rows (observations) of the data matrix x.

na.action

a function which indicates what should happen when the data contain NAs. The default is set by the na.action setting of options, and is na.fail if that is unset. The default is na.omit.

arguments passed to or from other methods.

x

a matrix or data frame.

grouping

grouping variable: a factor specifying the class for each observation.

explvar

a numeric value between 0 and 1 indicating how much variance should be covered by the robust PCs. Default is 0.99.

qcrit

a numeric value between 0 and 1 indicating the quantile to be used as critical value for outlier detection. Default is 0.975.

trace

whether to print intermediate results. Default is trace = FALSE

Value

An S4 object of class '>OutlierSign2 which is a subclass of the virtual class '>Outlier.

Details

Based on the robustly sphered and normed data, robust principal components are computed which are needed for determining distances for each observation. The distances are transformed to approach chi-square distribution, and a critical value is then used as outlier cutoff.

References

P. Filzmoser, R. Maronna and M. Werner (2008), Outlier identification in high dimensions, Computational Statistics & Data Analysis, Vol. 52 1694--1711.

P. Filzmoser & V. Todorov (2012), Robust tools for the imperfect world, To appear.

See Also

'>OutlierSign2, '>OutlierSign1, '>Outlier

Examples

Run this code
# NOT RUN {
data(hemophilia)
obj <- OutlierSign2(gr~.,data=hemophilia)
obj

getDistance(obj)            # returns an array of distances
getClassLabels(obj, 1)      # returns an array of indices for a given class
getCutoff(obj)              # returns an array of cutoff values (for each class, usually equal)
getFlag(obj)                # returns an 0/1 array of flags
plot(obj, class=2)          # standard plot function
# }

Run the code above in your browser using DataCamp Workspace