runfunc: Moving Window Analysis of a Vector

Description

A collection of functions to perform moving window (running, rolling window) analysis of vectors

Usage

runmean(x, k, alg=c("C", "R", "exact"),
         endrule=c("NA", "trim", "keep", "constant", "func"))
  runmin (x, k, alg=c("C", "R"),
         endrule=c("NA", "trim", "keep", "constant", "func"))
  runmax (x, k, alg=c("C", "R"),
         endrule=c("NA", "trim", "keep", "constant", "func"))
  runmad (x, k, center=runmed(x,k,endrule="keep"), constant=1.4826,  
          endrule=c("NA", "trim", "keep", "constant", "func"))
  runquantile(x, k, probs, type=7, 
          endrule=c("NA", "trim", "keep", "constant", "func"))
  EndRule(x, y, k, 
          endrule=c("NA", "trim", "keep", "constant", "func"), Func, ...)

Arguments

numeric vector of length n

width of moving window; must be an odd integer between three and n

endrule

character string indicating how the values at the beginning and the end, of the data, should be treated. Only first and last k2 values at both ends are affected, where k2 is the half-bandwidth k2 = k %/% 2

alg

an option allowing to choose different algorithms or implementations, if provided. Default is to use of code written in C. Option alg="R" will use slower code written in R. Usefull for debugging and allows extentions in the

center

moving window center used by runmad function defaults to running median (runmed function). Similar to center in mad functio

constant

scale factor used by runmad, such that for gaussian distribution X, mad(X) is the same as sd(X). Same as constant in

probs

numeric vector of probabilities with values in [0,1] range used by runquantile. For example Probs=c(0,0.5,1) would be equivalent to running runmin, runmed a

type

an integer between 1 and 9 selecting one of the nine quantile algorithms, same as type in quantile function. Another even more readable description of nine ways to calculate quanti

numeric vector of length n, which is partially filled output of one of the run functions. Function EndRule will fill the remaining beginning and end sections using method chosen by endrule argument.

Func

Function name that EndRule will use in case of endrule="func".

...

Additional parameters to Func that EndRule will use in case of endrule="func".

Value

Functions runmin, runmax, runmean and runmad return a numeric vector of the same length as x. Function runquantile returns a matrix of size [n $\times$ length(probs)]. In addition x contains attribute k with (the 'oddified') k.

concept

moving mean
rolling mean
running mean
moving average
rolling average
running average
moving min
rolling min
running min
moving max
rolling max
running max
moving minimum
rolling minimum
running minimum
moving maximum
rolling maximum
running maximum
moving quantile
rolling quantile
running quantile
moving percentile
rolling percentile
running percentile
moving mad
rolling mad
running mad
running window

Details

Apart from the end values, the result of y = runFUN(x, k) is the same as for(j=(1+k2):(n-k2)) y[j]=FUN(x[(j-k2):(j+k2)]), where FUN stands for min, max, mean, mad or quantile functions. The main incentive to write this set of functions was relative slowness of majority of moving window functions available in R and its packages. With exception of runmed, a running window median function, all functions listed in "see also" section are slower than very inefficient apply(embed(x,k),1,FUN) approach. Relative speeds of above functions are as follow:

runmin,runmax,runmeanrun at O(n)
runmean(..., alg="exact")can have worst case speed of O($n^2$) for some small data vectors, but average case is still close to O(n).
runquantileandrunmadrun at O(n*k)
runmed- related R function run at O(n*log(k))

Functions runquantile and runmad are using insertion sort to sort the moving window, but gain speed by remembering results of the previous sort. Since each time the window is moved, only one point changes, all but one points in the window are already sorted. Insertion sort can fix that in O(k) time. Function runquantile when run in single probability mode automatically recognizes probabilities: 0, 1/2, and 1 as special cases and return output from functions: runmin, runmed and runmax respectively. All run* functions are written in C, but runmin, runmax and runmean also have fast R code versions (see argument alg="R"). Those were included for debugging purposes, and as a fallback in hard-to-port situations. See examples. Function EndRule applies one of the five methods (see endrule argument) to process end-points of the input array x. In case of runmean(..., alg="exact") function a special algorithm is used (see references section) to ensure that round-off errors do not accumulate. As a result runmean is more accurate than filter(x, rep(1/k,k)) and runmean(..., alg="C") functions. All of the functions in this section do not work with infinite numbers (NA,NaN,Inf,-Inf) except for runmean(..., alg="exact") which omits them.

References

About quantiles: Hyndman, R. J. and Fan, Y. (1996)Sample quantiles in statistical packages, American Statistician, 50, 361.
About quantiles: Eric W. Weisstein.Quantile. From MathWorld-- A Wolfram Web Resource.http://mathworld.wolfram.com/Quantile.html
About insertion sort used inrunmadandrunquantile: R. Sedgewick (1988):Algorithms. Addison-Wesley (page 99)
About round-off error correction used inrunmean: Shewchuk, JonathanAdaptive Precision Floating-Point Arithmetic and Fast Robust Geometric Predicates,http://www-2.cs.cmu.edu/afs/cs/project/quake/public/papers/robust-arithmetic.ps
More on round-off error correction can be found at:http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/393090

Examples

Run this code

# test runmin, runmax and runmed
  k=15; n=200;
  x = rnorm(n,sd=30) + abs(seq(n)-n/4)
  col = c("black", "red", "green", "blue", "magenta", "cyan")
  plot(x, col=col[1], main = "Moving Window Analysis Functions")
  lines(runmin(x,k), col=col[2])
  lines(runmed(x,k), col=col[3])
  lines(runmax(x,k), col=col[4])
  legend(0,.9*n, c("data", "runmin", "runmed", "runmax"), col=col, lty=1 )

  #test runmean and runquantile
  y=runquantile(x, k, probs=c(0, 0.5, 1, 0.25, 0.75), endrule="constant")
  plot(x, col=col[1], main = "Moving Window Quantile")
  lines(runmean(y[,1],k), col=col[2])
  lines(y[,2], col=col[3])
  lines(runmean(y[,3],k), col=col[4])
  lines(y[,4], col=col[5])
  lines(y[,5], col=col[6])
  lab = c("data", "runmean(runquantile(0))", "runquantile(0.5)", 
  "runmean(runquantile(1))", "runquantile(.25)", "runquantile(.75)")
  legend(0,0.9*n, lab, col=col, lty=1 )

  #test runmean and runquantile
  k =25
  m=runmed(x, k)
  y=runmad(x, k, center=m)
  plot(x, col=col[1], main = "Moving Window Analysis Functions")
  lines(m    , col=col[2])
  lines(m-y/2, col=col[3])
  lines(m+y/2, col=col[4])
  lab = c("data", "runmed", "runmed-runmad/2", "runmed+runmad/2")
  legend(0,1.8*n, lab, col=col, lty=1 )

  # numeric comparison between different algorithms
  numeric.test = function (n, k) {
    eps = .Machine$double.eps ^ 0.5
    x = rnorm(n,sd=30) + abs(seq(n)-n/4)
    # numeric comparison : runmean
    a = runmean(x,k)
    b = runmean(x,k, alg="R")
    d = runmean(x,k, alg="exact")
    e = filter(x, rep(1/k,k))
    stopifnot(all(abs(a-b)<eps, na.rm=TRUE));
    stopifnot(all(abs(a-d)<eps, na.rm=TRUE));
    stopifnot(all(abs(a-e)<eps, na.rm=TRUE));
    # numeric comparison : runmin
    a = runmin(x,k, endrule="trim")
    b = runmin(x,k, endrule="trim", alg="R")
    c = apply(embed(x,k), 1, min)
    stopifnot(all(a==b, na.rm=TRUE));
    stopifnot(all(a==c, na.rm=TRUE));
    # numeric comparison : runmax
    a = runmax(x,k, endrule="trim")
    b = runmax(x,k, endrule="trim", alg="R")
    c = apply(embed(x,k), 1, max)
    stopifnot(all(a==b, na.rm=TRUE));
    stopifnot(all(a==c, na.rm=TRUE));
    # numeric comparison : runmad
    a = runmad(x,k, endrule="trim")
    b = apply(embed(x,k), 1, mad)
    stopifnot(all(a==b, na.rm=TRUE));
    # numeric comparison : runquantile
    a = runquantile(x,k, c(0.3, 0.7), endrule="trim")
    b = t(apply(embed(x,k), 1, quantile, probs = c(0.3, 0.7)))
    stopifnot(all(abs(a-b)<eps));
  }
  numeric.test(50, 3) # test different window size vs. vector ...
  numeric.test(50,15) # ... length combinations
  numeric.test(50,49)
  numeric.test(49,49)
  
  # speed comparison
  x=runif(100000); k=991;
  system.time(runmean(x,k))
  system.time(runmean(x,k, alg="R"))
  system.time(runmean(x,k, alg="exact"))
  system.time(filter(x, rep(1/k,k), sides=2)) #the fastest alternative I know
  k=91;
  system.time(runmad(x,k))
  system.time(apply(embed(x,k), 1, mad)) #the fastest alternative I know
  
  # numerical comparison of round-off error handling
  test.runmean = function (x, k) {
    a = k*runmean(x,k, alg="exact")  
    b = k*runmean(x,k, alg="C")  
    d = k*runmean(x,k, alg="R")
    e = k*filter(x, rep(1/k,k))
    f = k* c(NA, NA, apply(embed(x,k), 1, mean), NA, NA)
    x = cbind(x, a, b, d, e, f)
    colnames(x) = c("x","runmean(alg=exact)","runmean(alg=C)",
      "runmean(alg=R)","filter","apply")
    return(x)
  }
  a = rep( c(1, 10,    -10,    -1, 0, 0, 0), 3) # nice-behaving array
  b = rep( c(1, 10^20, -10^20, -1, 0, 0, 0), 3) # round-off error prone array
  d = rep( c(1, 10^20, 10^40, -10^40, -10^20, -1,  0), 3) 
  test.runmean(a, 5) #all runmean algorithms give the same result
  test.runmean(b, 5) #runmean(alg=R) gives wrong result
  test.runmean(d, 5) #only runmean(alg=exact) gives correct result

Run the code above in your browser using DataLab

Last chance! 50% off unlimited learning