robustX (version 1.2-2)

BACON: BACON for Regression or Multivariate Covariance Estimation

Description

BACON, short for ‘Blocked Adaptive Computationally-Efficient Outlier Nominators’, is a somewhat robust algorithm (set), with an implementation for regression or multivariate covariance estimation.

BACON() applies the multivariate (covariance estimation) algorithm, using mvBACON(x) in any case, and when y is not NULL adds a regression iteration phase, using the auxiliary .lmBACON() function.

Usage

BACON(x, y = NULL, intercept = TRUE,
      m = min(collect * p, n * 0.5),
      init.sel = c("Mahalanobis", "dUniMedian", "random", "manual"),
      man.sel, init.fraction = 0, collect = 4,
      alpha = 0.95, maxsteps = 100, verbose = TRUE)

## *Auxiliary* function: .lmBACON(x, y, intercept = TRUE, init.dis, init.fraction = 0, collect = 4, alpha = 0.95, maxsteps = 100, verbose = TRUE)

Arguments

x

a multivariate matrix of dimension [n x p] considered as containing no missing values.

y

the response (n vector) in the case of regression, or NULL for the multivariate case.

intercept

logical indicating if an intercept has to be used for the regression.

m

integer in 1:n specifying the size of the initial basic subset; used only when init.sel is not "manual"; see mvBACON.

init.sel

character string, specifying the initial selection mode; see mvBACON.

man.sel

only when init.sel == "manual", the indices of observations determining the initial basic subset (and m <- length(man.sel)).

init.dis

the distances of the x matrix used for the initial subset determined by mvBACON.

init.fraction

if this parameter is > 0 then the tedious steps of selecting the initial subset are skipped and an initial subset of size n * init.fraction is chosen (with smallest dis)

collect

numeric factor chosen by the user to define the size of the initial subset (p * collect)

alpha

significance level.

maxsteps

the maximal number of iteration steps (to prevent infinite loops)

verbose

logical indicating if messages are printed which trace progress of the algorithm.

Value

basically a list with components

subset

the observation indices (in 1:n) denoting the subset of ``good'' observations.

tis

............

Details

init.sel: the initial selection mode; implemented modes are: "Mah" -> based on Mahalanobis distance (default) "dis" -> based on the distances from the medians "ran" -> based on a random selection "man" -> based on manual selection in this case the vector 'man.sel' which contains the indices of the selected observations must be given. "Mah" and "dis" are proposed by Hadi while "ran" and "man" were implemented in order to study the behaviour of BACON.

References

Billor, N., Hadi, A. S., and Velleman , P. F. (2000). BACON: Blocked Adaptive Computationally-Efficient Outlier Nominators; Computational Statistics and Data Analysis 34, 279--298.

See Also

mvBACON, the multivariate version of the BACON algorithm.

Examples

Run this code
# NOT RUN {
data(starsCYG, package = "robustbase")
## Plot simple data and fitted lines
plot(starsCYG)
 lmST <-    lm(log.light ~ log.Te, data = starsCYG)
(B.ST <- with(starsCYG,  BACON(x = log.Te, y = log.light)))
require(robustbase)
(RlmST <- lmrob(log.light ~ log.Te, data = starsCYG))
abline(lmST, col = "red")
abline(RlmST, col = "blue")

# }

Run the code above in your browser using DataLab