robLoc: Robust M-Estimate of Location

Description

Compute the robust M-estimate of location for very small samples using the logistic $\psi$ function of Rousseeuw & Verboven (2002).

Usage

robLoc(
  x,
  scale = NULL,
  na.rm = FALSE,
  maxit = 80L,
  tol = sqrt(.Machine$double.eps)
)

Value

A single numeric value: the robust M-estimate of location. Returns NA if x has length zero (after removal of

NAs when na.rm = TRUE).

Arguments

x: A numeric vector.
scale: Optional numeric scalar giving a known scale. When supplied, the MAD is replaced by this value and the minimum sample size for iteration is lowered from 4 to 3 (see ‘Details’).
na.rm: Logical. If TRUE, NA values are stripped from x before computation. If FALSE (the default), the presence of any NA raises an error.
maxit: Maximum number of Newton--Raphson iterations. Defaults to 80.
tol: Convergence tolerance. Iteration stops when the absolute Newton step falls below tol. Defaults to sqrt(.Machine$double.eps).

Details

The location estimator $T_n$ is the solution to the M-estimating equation

$$\frac{1}{n}\sum_{i=1}^{n}\psi_{\mathrm{log}} \!\left(\frac{x_i - T_n}{S_n}\right) = 0$$

where $S_n$ is a fixed auxiliary scale estimate and $\psi_{\mathrm{log}}$ is the logistic psi function (Rousseeuw & Verboven 2002, Eq. 23):

$$\psi_{\mathrm{log}}(x) = \frac{e^x - 1}{e^x + 1} = \tanh(x/2)$$

This function is bounded in $(-1, 1)$, smooth ($C^\infty$), and strictly monotone. Boundedness provides robustness against outliers; smoothness avoids the corner artifacts of Huber's $\psi$ at small $n$.

Iteration scheme. The estimating equation is solved by Newton--Raphson iteration. The derivative of the logistic psi satisfies $\psi'(x) = 1 - \psi^2(x)$, so the Newton step requires no additional transcendental function evaluations beyond those already computed for the numerator. Starting value: $T^{(0)} = \mathrm{med}(x)$. Auxiliary scale: $S = \mathrm{MAD}(x)$ unless scale is supplied.

Decoupled estimation. Location and scale are estimated separately: robLoc holds the auxiliary scale fixed at $\mathrm{MAD}(x)$ throughout iteration, following the decoupled approach of Rousseeuw & Verboven (2002, Sec. 4.1). This avoids the positive-feedback instability of simultaneous location--scale iteration (Huber's Proposal 2) in small samples.

Fallback. When the sample is too small for reliable iteration the function returns the median directly:

$n < 4$ when scale is unknown (the MAD is unreliable at $n = 3$);
$n < 3$ when scale is known.

References

Rousseeuw, P. J. and Verboven, S. (2002) Robust estimation in very small samples. Computational Statistics & Data Analysis, 40(4), 741--758. tools:::Rd_expr_doi("10.1016/S0167-9473(02)00078-6")

Examples

Run this code

robLoc(c(1:9))

x <- c(1, 2, 3, 5, 7, 8)
robLoc(x)

# Known scale lowers the minimum sample size to 3
robLoc(c(1, 2, 3), scale = 1.5)

# Outlier resistance
x_clean <- c(2.0, 3.1, 2.7, 2.9, 3.3)
x_dirty <- replace(x_clean, 5, 100)
c(robLoc(x_clean), robLoc(x_dirty))   # barely moves
c(mean(x_clean), mean(x_dirty))       # destroyed

Run the code above in your browser using DataLab