Learn R Programming

oce (version 0.2-1)

despike: Remove spikes from a time series

Description

Remove spikes from a time series

Usage

despike(x, reference=c("median","smooth", "trim"), n=4, k=7, min, max,
        replace=c("reference", "NA"))

Arguments

x
a vector of values, interpreted as a time series
reference
indication of the type of reference time series to be used in the detection of spikes; see Details.
n
an indication of the limit to differences between x and the reference time series, used for reference="median" or reference="smooth"; see Details.
k
length of running median used with reference="median", and ignored for other values of reference.
min
minimum non-spike value of x, used with reference="trim".
max
maximum non-spike value of x, used with reference="trim".
replace
an indication of what to do with spike values, with "reference" indicating to replace them with the reference time series, and "NA" indicating to replace them with NA.

Value

  • A new vector in which spikes are replaced as described above.

Details

The method identifies spikes with respect to a "reference" time-series, and replaces these spikes with the reference value, or with NA according to the value of action.

For reference="median", the first step is to linearly interpolate across any gaps, in which x==NA. Then the reference time series is constructed using runmed as a running median of k elements. Then, the standard deviation of the difference between x and the reference is calculated. Any x values that differ from the reference by more than n times this standard deviation are considered to be spikes. If replace="reference", these x values are replaced with the reference series, and the resultant time series is returned. If replace="NA", the spikes are replaced with NA in the returned time series.

For reference="smooth", the processing is the same as for "median", except that smooth is used to calculate the reference time series. For reference="trim", the reference time series is constructed by linear interpolation across any regions in which x or x>max. In this case, the value of n is ignored, and the return value either uses the reference time series for spikes, or NA, according to the value of replace.

Examples

Run this code
n <- 50
x <- 1:n
y <- rnorm(n=n)
y[n/2] <- 10                    # 10 standard deviations
plot(x, y, type='l')
lines(x, despike(y), col='red')
lines(x, despike(y, reference="smooth"), col='darkgreen')
lines(x, despike(y, reference="trim", min=-3, max=3), col='blue')
legend("topright", lwd=1, col=c("black", "red", "darkgreen", "blue"),
       legend=c("raw", "median", "smooth", "trim"))

Run the code above in your browser using DataLab