robustX (version 1.2-4)

BACON: BACON for Regression or Multivariate Covariance Estimation

Description

BACON, short for ‘Blocked Adaptive Computationally-Efficient Outlier Nominators’, is a somewhat robust algorithm (set), with an implementation for regression or multivariate covariance estimation.

BACON() applies the multivariate (covariance estimation) algorithm, using mvBACON(x) in any case, and when y is not NULL adds a regression iteration phase, using the auxiliary .lmBACON() function.

Usage

BACON(x, y = NULL, intercept = TRUE,
      m = min(collect * p, n * 0.5),
      init.sel = c("Mahalanobis", "dUniMedian", "random", "manual"),
      man.sel, init.fraction = 0, collect = 4,
      alpha = 0.95, maxsteps = 100, verbose = TRUE)

## *Auxiliary* function: .lmBACON(x, y, intercept = TRUE, init.dis, init.fraction = 0, collect = 4, alpha = 0.95, maxsteps = 100, verbose = TRUE)

Arguments

x

a multivariate matrix of dimension [n x p] considered as containing no missing values.

y

the response (n vector) in the case of regression, or NULL for the multivariate case, where just mvBACON() is returned.

intercept

logical indicating if an intercept has to be used for the regression.

m

integer in 1:n specifying the size of the initial basic subset; used only when init.sel is not "manual"; see mvBACON.

init.sel

character string, specifying the initial selection mode; see mvBACON.

man.sel

only when init.sel == "manual", the indices of observations determining the initial basic subset (and m <- length(man.sel)).

init.dis

the distances of the x matrix used for the initial subset determined by mvBACON.

init.fraction

if this parameter is > 0 then the tedious steps of selecting the initial subset are skipped and an initial subset of size n * init.fraction is chosen (with smallest dis)

collect

numeric factor chosen by the user to define the size of the initial subset (p * collect)

alpha

significance level.

maxsteps

the maximal number of iteration steps (to prevent infinite loops)

verbose

logical indicating if messages are printed which trace progress of the algorithm.

Value

BACON(x,y,..) (for regression) returns a list with components

subset

the observation indices (in 1:n) denoting a subset of “good” supposedly outlier-free observations.

tis

the \(t_i(y_m, X_m)\) of eq (6) in the reference; the clean “basic subset” in the algorithm is defined the observations \(i\) with the smallest \(|t_i|\), and the \(t_i\) can be regarded as scaled predicted errors.

mv.dis

the (final) discrepancies or distances of mvBACON().

mv.subset

the “good” subset from mvBACON(), used to start the regression iterations.

Details

Notably about the initial selection mode, init.sel, see its description in the mvBACON arguments list.

References

Billor, N., Hadi, A. S., and Velleman , P. F. (2000). BACON: Blocked Adaptive Computationally-Efficient Outlier Nominators; Computational Statistics and Data Analysis 34, 279--298. 10.1016/S0167-9473(99)00101-2

See Also

mvBACON, the multivariate version of the BACON algorithm.

Examples

Run this code
# NOT RUN {
data(starsCYG, package = "robustbase")
## Plot simple data and fitted lines
plot(starsCYG)
lmST <- lm(log.light ~ log.Te, data = starsCYG)
abline(lmST, col = "gray") # least squares line
str(B.ST <- with(starsCYG,  BACON(x = log.Te, y = log.light)))
## 'subset': A good set of of points (to determine regression):
colB <- adjustcolor(2, 1/2)
points(log.light ~ log.Te, data = starsCYG, subset = B.ST$subset,
       pch = 19, cex = 1.5, col = colB)
## A BACON-derived line:
lmB <- lm(log.light ~ log.Te, data = starsCYG, subset = B.ST$subset)
abline(lmB, col = colB, lwd = 2)

require(robustbase)
(RlmST <- lmrob(log.light ~ log.Te, data = starsCYG))
abline(RlmST, col = "blue")
# }

Run the code above in your browser using DataLab