despike: Remove spikes from a time series

Description

Remove spikes from a time series

Usage

despike(x, reference=c("median","smooth", "trim"), n=4, k=7, min=NA, max=NA,
        replace=c("reference", "NA"), skip)

Arguments

a vector of (time-series) values, a list of vectors, a data frame, or an object that inherits from class oce.

reference

indication of the type of reference time series to be used in the detection of spikes; see Details.

an indication of the limit to differences between x and the reference time series, used for reference="median" or reference="smooth"; see Details.

length of running median used with reference="median", and ignored for other values of reference.

min

minimum non-spike value of x, used with reference="trim".

max

maximum non-spike value of x, used with reference="trim".

replace

an indication of what to do with spike values, with "reference" indicating to replace them with the reference time series, and "NA" indicating to replace them with NA.

skip

optional vector naming columns to be skipped. This is ignored if x is a simple vector. Any items named in skip will be passed through to the return value without modification. In some cases, despike

Value

A new vector in which spikes are replaced as described above.

Details

The method identifies spikes with respect to a "reference" time-series, and replaces these spikes with the reference value, or with NA according to the value of action.

For reference="median", the first step is to linearly interpolate across any gaps, in which x==NA. Then the reference time series is constructed using runmed as a running median of k elements. Then, the standard deviation of the difference between x and the reference is calculated. Any x values that differ from the reference by more than n times this standard deviation are considered to be spikes. If replace="reference", these x values are replaced with the reference series, and the resultant time series is returned. If replace="NA", the spikes are replaced with NA in the returned time series.

For reference="smooth", the processing is the same as for "median", except that smooth is used to calculate the reference time series. For reference="trim", the reference time series is constructed by linear interpolation across any regions in which x or x>max. In this case, the value of n is ignored, and the return value either uses the reference time series for spikes, or NA, according to the value of replace.

Examples

Run this code

n <- 50
x <- 1:n
y <- rnorm(n=n)
y[n/2] <- 10                    # 10 standard deviations
plot(x, y, type='l')
lines(x, despike(y), col='red')
lines(x, despike(y, reference="smooth"), col='darkgreen')
lines(x, despike(y, reference="trim", min=-3, max=3), col='blue')
legend("topright", lwd=1, col=c("black", "red", "darkgreen", "blue"),
       legend=c("raw", "median", "smooth", "trim"))

# add a spike to a CTD object
data(ctd)
plot(ctd)
T <- ctd[["temperature"]]
T[10] <- T[10] + 10
ctd[["temperature"]] <- T
CTD <- despike(ctd)
plot(CTD)

Run the code above in your browser using DataLab