Compute running medians of odd span. This is the ‘most robust’ scatter plot smoothing possible. For efficiency (and historical reason), you can use one of two different algorithms giving identical results.
runmed(x, k, endrule = c("median", "keep", "constant"),
algorithm = NULL,
na.action = c("+Big_alternate", "-Big_alternate", "na.omit", "fail"),
print.level = 0)
numeric vector, the ‘dependent’ variable to be smoothed.
integer width of median window; must be odd. Turlach had a
default of k <- 1 + 2 * min((n-1)%/% 2, ceiling(0.1*n))
.
Use k = 3
for ‘minimal’ robust smoothing eliminating
isolated outliers.
character string indicating how the values at the beginning and the end (of the data) should be treated. Can be abbreviated. Possible values are:
"keep"
keeps the first and last k2 = k %/% 2
,
i.e., y[j] = x[j]
for
"constant"
copies median(y[1:k2])
to the first
values and analogously for the last ones making the smoothed ends
constant;
"median"
the default, smooths the ends by using
symmetrical medians of subsequently smaller bandwidth, but for
the very first and last value where Tukey's robust end-point
rule is applied, see smoothEnds
.
character string (partially matching "Turlach"
or
"Stuetzle"
) or the default NULL
, specifying which algorithm
should be applied. The default choice depends on n = length(x)
and k
where "Turlach"
will be used for larger problems.
character string determining the behavior in the case of
NA
or NaN
in x
, (partially matching)
one of
"+Big_alternate"
Here, all the NAs in x
are
first replaced by alternating .Machine $ double.xmax
). The replacement
values are “from left” "+"
.
"-Big_alternate"
almost the same as
"+Big_alternate"
, just starting with "-Big..."
).
"na.omit"
the result is the same as
runmed(x[!is.na(x)], k, ..)
.
"fail"
the presence of NAs in x
will raise an error.
integer, indicating verboseness of algorithm; should rarely be changed by average users.
vector of smoothed values of the same length as x
with an
attr
ibute k
containing (the ‘oddified’)
k
.
Apart from the end values, the result y = runmed(x, k)
simply has
y[j] = median(x[(j-k2):(j+k2)])
(k = 2*k2+1
), computed very
efficiently.
The two algorithms are internally entirely different:
"Turlach"
is the H<U+00E4>rdle--Steiger
algorithm (see Ref.) as implemented by Berwin Turlach.
A tree algorithm is used, ensuring performance n = length(x)
which is
asymptotically optimal.
"Stuetzle"
is the (older) Stuetzle--Friedman implementation
which makes use of median updating when one observation
enters and one leaves the smoothing window. While this performs as
Note that, both algorithms (and the smoothEnds()
utility)
now “work” also when x
contains non-finite entries
(Inf
, NaN
, and
NA
):
"Turlach"
.......
"Stuetzle"
currently simply works by applying the
underlying math library (libm
) arithmetic for the non-finite
numbers; this may optionally change in the future.
Currently long vectors are only supported for algorithm = "Stuetzle"
.
H<U+00E4>rdle, W. and Steiger, W. (1995) Algorithm AS 296: Optimal median smoothing, Applied Statistics 44, 258--264. 10.2307/2986349.
Jerome H. Friedman and Werner Stuetzle (1982) Smoothing of Scatterplots; Report, Dep. Statistics, Stanford U., Project Orion 003.
smoothEnds
which implements Tukey's end point rule and
is called by default from runmed(*, endrule = "median")
.
smooth
uses running
medians of 3 for its compound smoothers.
# NOT RUN {
require(graphics)
utils::example(nhtemp)
myNHT <- as.vector(nhtemp)
myNHT[20] <- 2 * nhtemp[20]
plot(myNHT, type = "b", ylim = c(48, 60), main = "Running Medians Example")
lines(runmed(myNHT, 7), col = "red")
## special: multiple y values for one x
plot(cars, main = "'cars' data and runmed(dist, 3)")
lines(cars, col = "light gray", type = "c")
with(cars, lines(speed, runmed(dist, k = 3), col = 2))
# }
# NOT RUN {
<!-- %% FIXME: Show how to do it properly ! tapply(*, unique(.), median) -->
# }
# NOT RUN {
## nice quadratic with a few outliers
y <- ys <- (-20:20)^2
y [c(1,10,21,41)] <- c(150, 30, 400, 450)
all(y == runmed(y, 1)) # 1-neighbourhood <==> interpolation
plot(y) ## lines(y, lwd = .1, col = "light gray")
lines(lowess(seq(y), y, f = 0.3), col = "brown")
lines(runmed(y, 7), lwd = 2, col = "blue")
lines(runmed(y, 11), lwd = 2, col = "red")
## Lowess is not robust
y <- ys ; y[21] <- 6666 ; x <- seq(y)
col <- c("black", "brown","blue")
plot(y, col = col[1])
lines(lowess(x, y, f = 0.3), col = col[2])
# }
# NOT RUN {
<!-- %% predict(loess(y ~ x, span = 0.3, degree=1, family = "symmetric")) -->
# }
# NOT RUN {
<!-- %% gives 6-line warning but does NOT break down -->
# }
# NOT RUN {
lines(runmed(y, 7), lwd = 2, col = col[3])
legend(length(y),max(y), c("data", "lowess(y, f = 0.3)", "runmed(y, 7)"),
xjust = 1, col = col, lty = c(0, 1, 1), pch = c(1,NA,NA))
## An example with initial NA's - used to fail badly (notably for "Turlach"):
x15 <- c(rep(NA, 4), c(9, 9, 4, 22, 6, 1, 7, 5, 2, 8, 3))
rS15 <- cbind(Sk.3 = runmed(x15, k = 3, algorithm="S"),
Sk.7 = runmed(x15, k = 7, algorithm="S"),
Sk.11= runmed(x15, k =11, algorithm="S"))
rT15 <- cbind(Tk.3 = runmed(x15, k = 3, algorithm="T", print.level=1),
Tk.7 = runmed(x15, k = 7, algorithm="T", print.level=1),
Tk.9 = runmed(x15, k = 9, algorithm="T", print.level=1),
Tk.11= runmed(x15, k =11, algorithm="T", print.level=1))
cbind(x15, rS15, rT15) # result for k=11 maybe a bit surprising ..
Tv <- rT15[-(1:3),]
stopifnot(3 <= Tv, Tv <= 9, 5 <= Tv[1:10,])
matplot(y = cbind(x15, rT15), type = "b", ylim = c(1,9), pch=1:5, xlab = NA,
main = "runmed(x15, k, algo = \"Turlach\")")
mtext(paste("x15 <-", deparse(x15)))
points(x15, cex=2)
legend("bottomleft", legend=c("data", paste("k = ", c(3,7,9,11))),
bty="n", col=1:5, lty=1:5, pch=1:5)
# }
Run the code above in your browser using DataLab