double.window: Online / Offline signal extraction from time series

Description

Different filtering procedures using moving centered time windows for robust delayed extraction of low frequency components (the signal) in the presence of outliers and shifts. In a first step, a short inner window is used for calculating an initial estimate of the local signal value and a robust estimate of the local variance. Based on these initial estimates, strongly deviating observations (outliers) are trimmed from a possibly longer outer window, and the final signal estimate is calculated from the remaining observations in a second step. Both location based and regression based methods are available, the former applying the median and the idea of a locally constant signal value, the latter using Siegel's (1982) repeated median (RM) and the idea of an underlying locally linear trend.

Usage

double.window(y, outer.n, inner.n, d = 2, scale = "MAD",
              mad.corr.file = NULL, methods = "all", plot = FALSE,
              plot.methods = methods, online = FALSE)
double.window.online(...)

Arguments

One dimensional data vector

outer.n

Width of the outer window used for the final estimate. Needs to be an impair positive integer.

inner.n

Width of the inner window used for the initial estimate. Needs to be an impair positive integer no larger than outer.n.

Factor the scale estimate is multiplied with in the trimming boundaries. Default: d = 2 meaning a $2\sigma$ rule.

scale

Scale estimator to be used for the estimation of the local variance. Possible values: "MAD" (standard), Rousseeuw and Croux's (1993) "SN" and "QN".

mad.corr.file

File with correction factors to make the MAD scale estimate unbiased in finite samples. Default: Internal simulation of these correction factors for inner.n and outer.n

methods

A vector of the method(s) used for signal estimattion. Possible values are "MED", "RM", "MTM", "TRM", "MRM", "DWRM", "DWMTM", "DWTRM" and "DWMRM". For a thorough description see the Methods section.

plot

Plots some or all signal estimates obtained by the different methods in a time series plot if set to TRUE.

plot.methods

The estimates to be shown in the time series plot. Possible values are the same as in 'methods'. Only those selected in the methods parameter may be named.

online

Use a window that only uses observations from the past.

Value

signalMatrix (?) with the different signal estimates obtained from the choosen methods in its columns.
slopeMatrix (?) with the different estimates of the signal slope in its columns.

Methods

The following methods for signal estimation are currently implemented:

{Ordinary median filter (one step only)} RM{Ordinary repeated median filter (one step only)} MTM{Modified trimmed mean (median in the first / mean in the second step)} TRM{Trimming with repeated median (repeated median in the first and trimmed least squares in the second step)} MRM{Modified repeated median (RM in the first and the second step)} DWRM{Double window RM (only RM slope in the first step, and median of the trend corrected observations in the second step without trimming)} DWMTM{Double window MTM (different window widths in the two steps)} DWTRM{Double window TRM (different window widths in the two steps)} DWMRM{Double window MRM (different window widths in the two steps)}

concept

time delay
DWRM
MAD
running median
MTM
Qn
robust
Repeated median (RM)
Sn
level shifts
robust smoothing

Details

The method should be chosen based on an a-priori guess of the underlying signal and the data quality: Location based (MED / MTM) is recommended in case of a locally (piecewise) constant signal, regression based (RM / DWRM / TRM / MRM) in case of locally linear, monotone trends. No big differences have been reported between TRM and MRM, so preference might be given to the quicker and somewhat more efficient TRM option. DWRM is the quickest of all regression based methods and performs better than the ordinary RM at shifts, but it is the least robust and least efficient method.

The MAD is the classical highly robust choice for the estimation of the variance. The SN is a somewhat more efficient and almost equally robust alternative, while the QN is much more efficient in case of not very small window widths and performs very well at the occurrence of shifts.

The inner window width should be chosen at least twice the length of outlier patches in the series to be ignored in case of the location based, and at least three times this length in case of the regression based methods. Otherwise the methods can be severely influenced by outlier patches. The outer window width can then be chosen rather large to increase the efficiency of the final estimate, provided that it is smaller than the time in between subsequent level shifts.

The factor d with which the scale is multiplied for fixing the trimming boundaries can be chosen similar to classical rules for detecting unusual observations in a Gaussian sample. Choosing d=3 instead of d=2 increases efficiency, but decreases robustness; d=2.5 might be seen as a compromise.

References

Bernholt, T., Fried, R., Gather, U., Wegener, I. (2006), Modified Repeated Median Filters, Statistics and Computing, 16, 177-192; Preliminary version available as technical report from http://www.statistik.uni-dortmund.de/fixme

Examples

Run this code

data(serie1t)
double.window(serie1t$y,
              outer.n=25, inner.n=15,
              d=2.5, scale="QN",
              methods = c("RM", "TRM"))

Run the code above in your browser using DataLab