Learn R Programming

robscale (version 0.1.1)

robScale: Robust M-Estimate of Scale

Description

Compute the robust M-estimate of scale for very small samples using the \(\rho\) function of Rousseeuw & Verboven (2002).

Usage

robScale(
  x,
  loc = NULL,
  implbound = 0.0001,
  na.rm = FALSE,
  maxit = 80L,
  tol = sqrt(.Machine$double.eps)
)

Value

A single numeric value: the robust M-estimate of scale. Returns NA if x has length zero (after removal of

NAs when na.rm = TRUE).

Arguments

x

A numeric vector.

loc

Optional numeric scalar giving a known location. When supplied, the observations are centered at loc and the minimum sample size for iteration is lowered from 4 to 3 (see ‘Details’).

implbound

Implosion bound: the smallest value the MAD is allowed to take before it is considered to have “imploded” (collapsed to zero). Defaults to 1e-4.

na.rm

Logical. If TRUE, NA values are stripped from x before computation. If FALSE (the default), the presence of any NA raises an error.

maxit

Maximum number of multiplicative iterations. Defaults to 80.

tol

Convergence tolerance. Iteration stops when the multiplicative update factor satisfies \(|v - 1| \le \mathrm{tol}\). Defaults to sqrt(.Machine$double.eps).

Details

The scale estimator \(S_n\) solves the M-estimating equation

$$\frac{1}{n}\sum_{i=1}^{n}\rho\!\left(\frac{x_i - T_n}{S_n} \right) = \beta$$

where \(T_n\) is fixed at the sample median, \(\beta = 0.5\), and \(\rho\) is a smooth rho function defined as the square of the logistic psi (Rousseeuw & Verboven, 2002, Sec. 4.2):

$$\rho_{\mathrm{log}}(x) = \psi_{\mathrm{log}}^2\!\left( \frac{x}{c}\right)$$

with the tuning constant \(c = 0.37394112142347236\) chosen so that

$$\int\rho(u)\,d\Phi(u) = 0.5$$

yielding a 50% breakdown point.

Iteration scheme. The equation is solved by multiplicative iteration (Rousseeuw & Verboven, 2002, Eq. 27):

$$S^{(k+1)} = S^{(k)} \cdot \sqrt{2 \cdot \frac{1}{n}\sum \psi_{\mathrm{log}}^2\!\left(\frac{x_i - T}{c \cdot S^{(k)}}\right)}$$

Starting value: \(S^{(0)} = \mathrm{MAD}(x)\). The logistic psi values are computed via the algebraic identity \(\psi_{\mathrm{log}}(x) = \tanh(x/2)\).

Decoupled estimation. Scale is estimated with location held fixed at \(\mathrm{med}(x)\), following the decoupled approach of Rousseeuw & Verboven (2002, Sec. 4.2). This avoids the positive-feedback instability of Huber's Proposal 2 in small samples.

Known location. When loc is supplied, the observations are centered as \(x_i - \mu\) and the initial scale is set to \(1.4826 \cdot \mathrm{med}(|x_i - \mu|)\) rather than the MAD. This lowers the minimum sample size from 4 to 3 (Rousseeuw & Verboven, 2002, Sec. 5).

Fallback. When \(n\) is below the minimum for iteration:

  • if \(\mathrm{MAD}(x) \le\) implbound (implosion), the function returns adm(x);

  • otherwise, it returns \(\mathrm{MAD}(x)\).

References

Rousseeuw, P. J. and Verboven, S. (2002) Robust estimation in very small samples. Computational Statistics & Data Analysis, 40(4), 741--758. tools:::Rd_expr_doi("10.1016/S0167-9473(02)00078-6")

See Also

adm for the implosion fallback; mad for the starting value and classical alternative; robLoc for the companion location estimator.

Examples

Run this code
robScale(c(1:9))

x <- c(1, 2, 3, 5, 7, 8)
robScale(x)
robScale(x, loc = 5)           # known location

# Outlier resistance
x_clean <- c(2.0, 3.1, 2.7, 2.9, 3.3)
x_dirty <- replace(x_clean, 5, 100)
c(robScale(x_clean), robScale(x_dirty))   # barely moves
c(sd(x_clean), sd(x_dirty))               # destroyed

# MAD implosion: identical values cause MAD = 0
robScale(c(5, 5, 5, 5, 6))     # falls back to adm()

Run the code above in your browser using DataLab