runmed
Running Medians -- Robust Scatter Plot Smoothing
Compute running medians of odd span. This is the most robust scatter plot smoothing possible. For efficiency (and historical reason), you can use one of two different algorithms giving identical results.
Usage
runmed(x, k, endrule = c("median", "keep", "constant"), algorithm = NULL, print.level = 0)
Arguments
- x
- numeric vector, the dependent variable to be smoothed.
- k
- integer width of median window; must be odd. Turlach had a
default of
k <- 1 + 2 * min((n-1)%/% 2, ceiling(0.1*n))
. Usek = 3
for minimal robust smoothing eliminating isolated outliers. - endrule
- character string indicating how the values at the
beginning and the end (of the data) should be treated.
Can be abbreviated. Possible values are:
"keep"
- keeps the first and last $k2$ values
at both ends, where $k2$ is the half-bandwidth
k2 = k %/% 2
, i.e.,y[j] = x[j]
for $j = 1, \dots, k2 and (n-k2+1), \dots, n$; "constant"
- copies
median(y[1:k2])
to the first values and analogously for the last ones making the smoothed ends constant; "median"
- the default, smooths the ends by using
symmetrical medians of subsequently smaller bandwidth, but for
the very first and last value where Tukey's robust end-point
rule is applied, see
smoothEnds
.
- algorithm
- character string (partially matching
"Turlach"
or"Stuetzle"
) or the defaultNULL
, specifying which algorithm should be applied. The default choice depends onn = length(x)
andk
where"Turlach"
will be used for larger problems. - print.level
- integer, indicating verboseness of algorithm; should rarely be changed by average users.
Details
Apart from the end values, the result y = runmed(x, k)
simply has
y[j] = median(x[(j-k2):(j+k2)])
(k = 2*k2+1
), computed very
efficiently.
The two algorithms are internally entirely different:
"Turlach"
- is the Härdle--Steiger
algorithm (see Ref.) as implemented by Berwin Turlach.
A tree algorithm is used, ensuring performance $O(n * log(k))$ where
n = length(x)
which is asymptotically optimal.
"Stuetzle"
Currently long vectors are only supported for algorithm = "Steutzle"
.
Value
x
with an
attr
ibute k
containing (the oddified)
k
.
References
Härdle, W. and Steiger, W. (1995) [Algorithm AS 296] Optimal median smoothing, Applied Statistics 44, 258--264.
Jerome H. Friedman and Werner Stuetzle (1982) Smoothing of Scatterplots; Report, Dep. Statistics, Stanford U., Project Orion 003.
Martin Maechler (2003) Fast Running Medians: Finite Sample and Asymptotic Optimality; working paper available from the author.
See Also
smoothEnds
which implements Tukey's end point rule and
is called by default from runmed(*, endrule = "median")
.
smooth
uses running
medians of 3 for its compound smoothers.
Examples
library(stats)
require(graphics)
utils::example(nhtemp)
myNHT <- as.vector(nhtemp)
myNHT[20] <- 2 * nhtemp[20]
plot(myNHT, type = "b", ylim = c(48, 60), main = "Running Medians Example")
lines(runmed(myNHT, 7), col = "red")
## special: multiple y values for one x
plot(cars, main = "'cars' data and runmed(dist, 3)")
lines(cars, col = "light gray", type = "c")
with(cars, lines(speed, runmed(dist, k = 3), col = 2))
## nice quadratic with a few outliers
y <- ys <- (-20:20)^2
y [c(1,10,21,41)] <- c(150, 30, 400, 450)
all(y == runmed(y, 1)) # 1-neighbourhood <==> interpolation
plot(y) ## lines(y, lwd = .1, col = "light gray")
lines(lowess(seq(y), y, f = 0.3), col = "brown")
lines(runmed(y, 7), lwd = 2, col = "blue")
lines(runmed(y, 11), lwd = 2, col = "red")
## Lowess is not robust
y <- ys ; y[21] <- 6666 ; x <- seq(y)
col <- c("black", "brown","blue")
plot(y, col = col[1])
lines(lowess(x, y, f = 0.3), col = col[2])
lines(runmed(y, 7), lwd = 2, col = col[3])
legend(length(y),max(y), c("data", "lowess(y, f = 0.3)", "runmed(y, 7)"),
xjust = 1, col = col, lty = c(0, 1, 1), pch = c(1,NA,NA))