robustX (version 1.2-4)

mvBACON: BACON: Blocked Adaptive Computationally-Efficient Outlier Nominators

Description

This function performs an outlier identification algorithm to the data in the x array [n x p] and y vector [n] following the lines described by Hadi et al. for their BACON outlier procedure.

Usage

mvBACON(x, collect = 4, m = min(collect * p, n * 0.5), alpha = 0.95,
        init.sel = c("Mahalanobis", "dUniMedian", "random", "manual"),
        man.sel, maxsteps = 100, allowSingular = FALSE, verbose = TRUE)

Arguments

x

numeric matrix (of dimension \([n x p]\)), not supposed to contain missing values.

collect

a multiplication factor \(c\), when init.sel is not "manual", to define \(m\), the size of the initial basic subset, as \(m := c \cdot p\), in practice, m <- min(p * collect, n/2).

m

integer in 1:n specifying the size of the initial basic subset; used only when init.sel is not "manual".

alpha

significance level for the \(\chi^2\) cutoff, used to define the next iterations basic subset.

init.sel

character string, specifying the initial selection mode; implemented modes are:

"Mahalanobis"

based on Mahalanobis distances (default); the version \(V1\) of the reference; affine invariant but not robust.

"dUniMedian"

based on the distances from the univariate medians; ; the version \(V2\) of the reference; robust but not affine invariant.

"random"

based on a random selection, i.e., reproducible only via set.seed().

"manual"

based on manual selection; in this case, a vector man.sel containing the indices of the selected observations must be specified.

"Mahalanobis", "dUniMedian" where proposed by Hadi and the other authors in the reference as versions ‘V_1’ and ‘V_2’, as well as "manual", while "random" is provided in order to study the behaviour of BACON.

man.sel

only when init.sel == "manual", the indices of observations determining the initial basic subset (and m <- length(man.sel)).

maxsteps

maximal number of iteration steps.

allowSingular

logical indicating a solution should be sought also when no matrix of rank \(p\) is found.

verbose

logical indicating if messages are printed which trace progress of the algorithm.

Value

a list with components

subset

logical vector of length n where the i-th entry is true iff the i-th observation is part of the final selection.

dis

numeric vector of length n with the (Mahalanobis) distances.

cov

\(p \times p\) matrix, the corresponding robust estimate of covariance.

References

Billor, N., Hadi, A. S., and Velleman , P. F. (2000). BACON: Blocked Adaptive Computationally-Efficient Outlier Nominators; Computational Statistics and Data Analysis 34, 279--298. 10.1016/S0167-9473(99)00101-2

See Also

covMcd for a high-breakdown (but more computer intensive) method; BACON for a “generalization”, notably to regression.

Examples

Run this code
# NOT RUN {
 require(robustbase) # for example data and covMcd():
## simple 2D example :
 plot(starsCYG, main = "starsCYG  data  (n=47)")
 B.st <- mvBACON(starsCYG)
 points(starsCYG[ ! B.st$subset,], pch = 4, col = 2, cex = 1.5)
 stopifnot(identical(which(!B.st$subset), c(7L,9L,11L,14L,20L,30L,34L)))
 ## finds the clear outliers (and 3 "borderline")

 ## 'coleman' from pkg 'robustbase'
 coleman.x <- data.matrix(coleman[, 1:6])
 Cc <- covMcd (coleman.x) # truly robust
 summary(Cc) # -> 6 outliers (1,3,10,12,17,18)
 Cb1 <- mvBACON(coleman.x) ##-> subset is all TRUE hmm??
 Cb2 <- mvBACON(coleman.x, init.sel = "dUniMedian")
 stopifnot(all.equal(Cb1, Cb2))
 Cb.r <- lapply(1:20, function(i) { set.seed(i)
                     mvBACON(coleman.x, init.sel="random", verbose=FALSE) })
 nm <- names(Cb.r[[1]]); nm <- nm[nm != "steps"]
 all(eqC <- sapply(Cb.r[-1], function(CC) all.equal(CC[nm], Cb.r[[1]][nm]))) # TRUE
 ## --> BACON always  breaks down, i.e., does not see the outliers here
# }
# NOT RUN {
 ## breaks down even when manually starting with all the non-outliers:
 Cb.man <- mvBACON(coleman.x, init.sel = "manual",
                   man.sel = setdiff(1:20, c(1,3,10,12,17,18)))
 which( ! Cb.man$subset) # the outliers according to mvBACON : _none_
# }

Run the code above in your browser using DataCamp Workspace