covMcd(x, cor = FALSE, raw.only = FALSE,
alpha =, nsamp =, nmini =, kmini =,
scalefn =, maxcsteps =,
initHsets = NULL, save.hsets = FALSE, names = TRUE, seed =, tolSolve =, trace =,
use.correction =, wgtFUN =, control = rrcov.control())
cor = FALSE
.alpha*n
,
(see "best"
,
"exact"
, or "deterministic"
. Default is nsamp = 500
.
For nsamp = "best"
exhaustive enumeration is done, as long as
the nukmini
(by default 5) subsets, of size
approximately, but at least nmini
. When nmini*kmini < n
,
the initiafunction
to
compute a robust scale estimate or character string specifying a
rule determining such a function. The default, currently
"hrv2012"
, uses th1:n
).initHsets
..Random.seed
, see rrcov.control
.solve
) of the covariance matrix in mahalanobis
.FALSE
; values $\ge 2$
also produce print from the internal (Fortran) code.TRUE
.function
, specifying
how the weights for the reweighting step should be computed. Up to
April 2013, the only option has been the original proposal in (1999),
now specifierrcov.control
for the defaults. If control
is
supplied, the parameters "mcd"
which is basically a
list
with componentscor = TRUE
).length(best) == quan =
h.alpha.n(alpha,n,p)
.NA
s.quan
equals n.obs
, the MCD is the classical
covariance matrix."Deterministic"
when
nsamp="deterministic"
.match.call
).covMcd()
is similar to Rfunction
cov.mcd()
in h.alpha.n(alpha,n,p)
) observations (out of $n$)
whose classical covariance matrix has the lowest possible determinant. The raw MCD estimate of location is then the average of these $h$ points,
whereas the raw MCD estimate of scatter is their covariance matrix,
multiplied by a consistency factor (.MCDcons(p, h/n)
) and (if
use.correction
is true) a finite sample correction factor
(.MCDcnp2(p, n, alpha)
), to make it consistent at the
normal model and unbiased at small samples. Both rescaling factors
(consistency and finite sample) are returned in the length-2 vector
raw.cnp2
.
The implementation of covMcd
uses the Fast MCD algorithm of
Rousseeuw and Van Driessen (1999) to approximate the minimum
covariance determinant estimator.
Based on these raw MCD estimates, (unless argument raw.only
is
true), a reweighting step is performed, i.e., V <- cov.wt(x,w)
,
where w
are weights determined by use.correction
is true) a finite sample correction factor
(.MCDcnp2.rew(p, n, alpha)
) are applied.
The reweighted covariance is typically considerably more efficient
than the raw one, see Pison et al. (2002).
The two rescaling factors for the reweighted estimates are returned in
cnp2
. Details for the computation of the finite sample
correction factors can be found in Pison et al. (2002).
Rousseeuw, P. J. and van Driessen, K. (1999) A fast algorithm for the minimum covariance determinant estimator. Technometrics 41, 212--223.
Pison, G., Van Aelst, S., and Willems, G. (2002) Small Sample Corrections for LTS and MCD, Metrika 55, 111--123. Hubert, M., Rousseeuw, P. J. and Verdonck, T. (2012) A deterministic algorithm for robust location and scatter. Journal of Computational and Graphical Statistics 21, 618--637.
cov.mcd
from package covOGK
as cheaper alternative for larger dimensions.data(hbk)
hbk.x <- data.matrix(hbk[, 1:3])
set.seed(17)
(cH <- covMcd(hbk.x))
cH0 <- covMcd(hbk.x, nsamp = "deterministic")
with(cH0, stopifnot(quan == 39,
iBest == c(1:4,6), # 5 out of 6 gave the same
identical(raw.weights, mcd.wt),
identical(which(mcd.wt == 0), 1:14), all.equal(crit, -1.045500594135)))
## the following three statements are equivalent
c1 <- covMcd(hbk.x, alpha = 0.75)
c2 <- covMcd(hbk.x, control = rrcov.control(alpha = 0.75))
## direct specification overrides control one:
c3 <- covMcd(hbk.x, alpha = 0.75,
control = rrcov.control(alpha=0.95))
c1
## Martin's smooth reweighting:
## List of experimental pre-specified wgtFUN() creators:
## Cutoffs may depend on (n, p, control$beta) :
str(.wgtFUN.covMcd)
cMM <- covMcd(hbk.x, wgtFUN = "sm1.adaptive")
ina <- which(names(cH) == "call")
all.equal(cMM[-ina], cH[-ina]) # *some* differences, not huge (same 'best'):
stopifnot(all.equal(cMM[-ina], cH[-ina], tol = 0.2))
Run the code above in your browser using DataLab