robScale: Robust M-Estimate of Scale

Description

Compute the robust M-estimate of scale for very small samples using the $\rho$ function of Rousseeuw & Verboven (2002).

Usage

robScale(
  x,
  loc = NULL,
  implbound = 0.0001,
  na.rm = FALSE,
  maxit = 80L,
  tol = sqrt(.Machine$double.eps)
)

Value

A single numeric value: the robust M-estimate of scale. Returns NA if x has length zero (after removal of

NAs when na.rm = TRUE).

Arguments

x: A numeric vector.
loc: Optional numeric scalar giving a known location. When supplied, the observations are centered at loc and the minimum sample size for iteration is lowered from 4 to 3 (see ‘Details’).
implbound: Implosion bound: the smallest value the MAD is allowed to take before it is considered to have “imploded” (collapsed to zero). Defaults to 1e-4.
na.rm: Logical. If TRUE, NA values are stripped from x before computation. If FALSE (the default), the presence of any NA raises an error.
maxit: Maximum number of multiplicative iterations. Defaults to 80.
tol: Convergence tolerance. Iteration stops when the multiplicative update factor satisfies $|v - 1| \le \mathrm{tol}$. Defaults to sqrt(.Machine$double.eps).

Details

The scale estimator $S_n$ solves the M-estimating equation

$$\frac{1}{n}\sum_{i=1}^{n}\rho\!\left(\frac{x_i - T_n}{S_n} \right) = \beta$$

where $T_n$ is fixed at the sample median, $\beta = 0.5$, and $\rho$ is a smooth rho function defined as the square of the logistic psi (Rousseeuw & Verboven, 2002, Sec. 4.2):

$$\rho_{\mathrm{log}}(x) = \psi_{\mathrm{log}}^2\!\left( \frac{x}{c}\right)$$

with the tuning constant $c = 0.37394112142347236$ chosen so that

$$\int\rho(u)\,d\Phi(u) = 0.5$$

yielding a 50% breakdown point.

Iteration scheme. The equation is solved by multiplicative iteration (Rousseeuw & Verboven, 2002, Eq. 27):

$$S^{(k+1)} = S^{(k)} \cdot \sqrt{2 \cdot \frac{1}{n}\sum \psi_{\mathrm{log}}^2\!\left(\frac{x_i - T}{c \cdot S^{(k)}}\right)}$$

Starting value: $S^{(0)} = \mathrm{MAD}(x)$. The logistic psi values are computed via the algebraic identity $\psi_{\mathrm{log}}(x) = \tanh(x/2)$.

Decoupled estimation. Scale is estimated with location held fixed at $\mathrm{med}(x)$, following the decoupled approach of Rousseeuw & Verboven (2002, Sec. 4.2). This avoids the positive-feedback instability of Huber's Proposal 2 in small samples.

Known location. When loc is supplied, the observations are centered as $x_i - \mu$ and the initial scale is set to $1.4826 \cdot \mathrm{med}(|x_i - \mu|)$ rather than the MAD. This lowers the minimum sample size from 4 to 3 (Rousseeuw & Verboven, 2002, Sec. 5).

Fallback. When $n$ is below the minimum for iteration:

if $\mathrm{MAD}(x) \le$ implbound (implosion), the function returns adm(x);
otherwise, it returns $\mathrm{MAD}(x)$.

References

Rousseeuw, P. J. and Verboven, S. (2002) Robust estimation in very small samples. Computational Statistics & Data Analysis, 40(4), 741--758. tools:::Rd_expr_doi("10.1016/S0167-9473(02)00078-6")

Examples

Run this code

robScale(c(1:9))

x <- c(1, 2, 3, 5, 7, 8)
robScale(x)
robScale(x, loc = 5)           # known location

# Outlier resistance
x_clean <- c(2.0, 3.1, 2.7, 2.9, 3.3)
x_dirty <- replace(x_clean, 5, 100)
c(robScale(x_clean), robScale(x_dirty))   # barely moves
c(sd(x_clean), sd(x_dirty))               # destroyed

# MAD implosion: identical values cause MAD = 0
robScale(c(5, 5, 5, 5, 6))     # falls back to adm()

Run the code above in your browser using DataLab