caTools (version 1.7)

runmean, runmin, runmax & runquantile: Moving Window Analysis of a Vector

Description

A collection of functions to perform moving window (running, rolling window) analysis of vectors

Usage

runmean(x, k, alg=c("C", "R", "exact"),
         endrule=c("NA", "trim", "keep", "constant", "func"))
  runmin (x, k, alg=c("C", "R"),
         endrule=c("NA", "trim", "keep", "constant", "func"))
  runmax (x, k, alg=c("C", "R"),
         endrule=c("NA", "trim", "keep", "constant", "func"))
  runmad (x, k, center=runmed(x,k,endrule="keep"), constant=1.4826,  
          endrule=c("NA", "trim", "keep", "constant", "func"))
  runquantile(x, k, probs, type=7, 
          endrule=c("NA", "trim", "keep", "constant", "func"))
  EndRule(x, y, k, 
          endrule=c("NA", "trim", "keep", "constant", "func"), Func, ...)

Arguments

x
numeric vector of length n
k
width of moving window; must be an odd integer between three and n
endrule
character string indicating how the values at the beginning and the end, of the data, should be treated. Only first and last k2 values at both ends are affected, where k2 is the half-bandwidth k2 = k %/% 2
alg
an option allowing to choose different algorithms or implementations, if provided. Default is to use of code written in C. Option alg="R" will use slower code written in R. Useful for debugging and allows extensions in the f
center
moving window center used by runmad function defaults to running median (runmed function). Similar to center in mad functio
constant
scale factor used by runmad, such that for Gaussian distribution X, mad(X) is the same as sd(X). Same as constant in
probs
numeric vector of probabilities with values in [0,1] range used by runquantile. For example Probs=c(0,0.5,1) would be equivalent to running runmin, runmed a
type
an integer between 1 and 9 selecting one of the nine quantile algorithms, same as type in quantile function. Another even more readable description of nine ways to calculate quanti
y
numeric vector of length n, which is partially filled output of one of the run functions. Function EndRule will fill the remaining beginning and end sections using method chosen by endrule argument.
Func
Function name that EndRule will use in case of endrule="func".
...
Additional parameters to Func that EndRule will use in case of endrule="func".

Value

  • Functions runmin, runmax, runmean and runmad return a numeric vector of the same length as x. Function runquantile returns a matrix of size [n $\times$ length(probs)]. In addition x contains attribute k with (the 'oddified') k.

concept

  • moving mean
  • rolling mean
  • running mean
  • moving average
  • rolling average
  • running average
  • moving min
  • rolling min
  • running min
  • moving max
  • rolling max
  • running max
  • moving minimum
  • rolling minimum
  • running minimum
  • moving maximum
  • rolling maximum
  • running maximum
  • moving quantile
  • rolling quantile
  • running quantile
  • moving percentile
  • rolling percentile
  • running percentile
  • moving mad
  • rolling mad
  • running mad
  • running window

Details

Apart from the end values, the result of y = runFUN(x, k) is the same as for(j=(1+k2):(n-k2)) y[j]=FUN(x[(j-k2):(j+k2)]), where FUN stands for min, max, mean, mad or quantile functions. The main incentive to write this set of functions was relative slowness of majority of moving window functions available in R and its packages. With exception of runmed, a running window median function, all functions listed in "see also" section are slower than very inefficient apply(embed(x,k),1,FUN) approach. Relative speeds of above functions are as follow:
  • runmin,runmax,runmeanrun at O(n)
  • runmean(..., alg="exact")can have worst case speed of O($n^2$) for some small data vectors, but average case is still close to O(n).
  • runquantileandrunmadrun at O(n*k)
  • runmed- related R function runs at O(n*log(k))
Functions runquantile and runmad are using insertion sort to sort the moving window, but gain speed by remembering results of the previous sort. Since each time the window is moved, only one point changes, all but one points in the window are already sorted. Insertion sort can fix that in O(k) time. Function runquantile when run in single probability mode automatically recognizes probabilities: 0, 1/2, and 1 as special cases and return output from functions: runmin, runmed and runmax respectively. All run* functions are written in C, but runmin, runmax and runmean also have fast R code versions (see argument alg="R"). Those were included for debugging purposes, and as a fallback in hard-to-port situations. See examples. Function EndRule applies one of the five methods (see endrule argument) to process end-points of the input array x. In case of runmean(..., alg="exact") function a special algorithm is used (see references section) to ensure that round-off errors do not accumulate. As a result runmean is more accurate than filter(x, rep(1/k,k)) and runmean(..., alg="C") functions. All of the functions in this section do not work with infinite numbers (NA,NaN,Inf,-Inf) except for runmean(..., alg="exact") which omits them.

References

  • About quantiles: Hyndman, R. J. and Fan, Y. (1996)Sample quantiles in statistical packages, American Statistician, 50, 361.
  • About quantiles: Eric W. Weisstein.Quantile. From MathWorld-- A Wolfram Web Resource.http://mathworld.wolfram.com/Quantile.html
  • About insertion sort used inrunmadandrunquantile: R. Sedgewick (1988):Algorithms. Addison-Wesley (page 99)
  • About round-off error correction used inrunmean: Shewchuk, JonathanAdaptive Precision Floating-Point Arithmetic and Fast Robust Geometric Predicates,http://www-2.cs.cmu.edu/afs/cs/project/quake/public/papers/robust-arithmetic.ps
  • More on round-off error correction can be found at:http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/393090

See Also

Links related to each function:

Examples

Run this code
# test runmin, runmax and runmed
  k=15; n=200;
  x = rnorm(n,sd=30) + abs(seq(n)-n/4)
  col = c("black", "red", "green", "blue", "magenta", "cyan")
  plot(x, col=col[1], main = "Moving Window Analysis Functions")
  lines(runmin(x,k), col=col[2])
  lines(runmed(x,k), col=col[3])
  lines(runmax(x,k), col=col[4])
  legend(0,.9*n, c("data", "runmin", "runmed", "runmax"), col=col, lty=1 )

  #test runmean and runquantile
  y=runquantile(x, k, probs=c(0, 0.5, 1, 0.25, 0.75), endrule="constant")
  plot(x, col=col[1], main = "Moving Window Quantile")
  lines(runmean(y[,1],k), col=col[2])
  lines(y[,2], col=col[3])
  lines(runmean(y[,3],k), col=col[4])
  lines(y[,4], col=col[5])
  lines(y[,5], col=col[6])
  lab = c("data", "runmean(runquantile(0))", "runquantile(0.5)", 
  "runmean(runquantile(1))", "runquantile(.25)", "runquantile(.75)")
  legend(0,0.9*n, lab, col=col, lty=1 )

  #test runmean and runquantile
  k =25
  m=runmed(x, k)
  y=runmad(x, k, center=m)
  plot(x, col=col[1], main = "Moving Window Analysis Functions")
  lines(m    , col=col[2])
  lines(m-y/2, col=col[3])
  lines(m+y/2, col=col[4])
  lab = c("data", "runmed", "runmed-runmad/2", "runmed+runmad/2")
  legend(0,1.8*n, lab, col=col, lty=1 )

  # numeric comparison between different algorithms
  numeric.test = function (n, k) {
    eps = .Machine$double.eps ^ 0.5
    x = rnorm(n,sd=30) + abs(seq(n)-n/4)
    # numeric comparison : runmean
    a = runmean(x,k)
    b = runmean(x,k, alg="R")
    d = runmean(x,k, alg="exact")
    e = filter(x, rep(1/k,k))
    stopifnot(all(abs(a-b)<eps, na.rm=TRUE));
    stopifnot(all(abs(a-d)<eps, na.rm=TRUE));
    stopifnot(all(abs(a-e)<eps, na.rm=TRUE));
    # numeric comparison : runmin
    a = runmin(x,k, endrule="trim")
    b = runmin(x,k, endrule="trim", alg="R")
    c = apply(embed(x,k), 1, min)
    stopifnot(all(a==b, na.rm=TRUE));
    stopifnot(all(a==c, na.rm=TRUE));
    # numeric comparison : runmax
    a = runmax(x,k, endrule="trim")
    b = runmax(x,k, endrule="trim", alg="R")
    c = apply(embed(x,k), 1, max)
    stopifnot(all(a==b, na.rm=TRUE));
    stopifnot(all(a==c, na.rm=TRUE));
    # numeric comparison : runmad
    a = runmad(x,k, endrule="trim")
    b = apply(embed(x,k), 1, mad)
    stopifnot(all(a==b, na.rm=TRUE));
    # numeric comparison : runquantile
    a = runquantile(x,k, c(0.3, 0.7), endrule="trim")
    b = t(apply(embed(x,k), 1, quantile, probs = c(0.3, 0.7)))
    stopifnot(all(abs(a-b)<eps));
  }
  numeric.test(50, 3) # test different window size vs. vector ...
  numeric.test(50,15) # ... length combinations
  numeric.test(50,49)
  numeric.test(49,49)
  
  # speed comparison
  x=runif(100000); k=991;
  system.time(runmean(x,k))
  system.time(runmean(x,k, alg="R"))
  system.time(runmean(x,k, alg="exact"))
  system.time(filter(x, rep(1/k,k), sides=2)) #the fastest alternative I know
  k=91;
  system.time(runmad(x,k))
  system.time(apply(embed(x,k), 1, mad)) #the fastest alternative I know
  
  # numerical comparison of round-off error handling
  test.runmean = function (x, k) {
    a = k*runmean(x,k, alg="exact")  
    b = k*runmean(x,k, alg="C")  
    d = k*runmean(x,k, alg="R")
    e = k*filter(x, rep(1/k,k))
    f = k* c(NA, NA, apply(embed(x,k), 1, mean), NA, NA)
    x = cbind(x, a, b, d, e, f)
    colnames(x) = c("x","runmean(alg=exact)","runmean(alg=C)",
      "runmean(alg=R)","filter","apply")
    return(x)
  }
  a = rep( c(1, 10,    -10,    -1, 0, 0, 0), 3) # nice-behaving array
  b = rep( c(1, 10^20, -10^20, -1, 0, 0, 0), 3) # round-off error prone array
  d = rep( c(1, 10^20, 10^40, -10^40, -10^20, -1,  0), 3) 
  test.runmean(a, 5) #all runmean algorithms give the same result
  test.runmean(b, 5) #runmean(alg=R) gives wrong result
  test.runmean(d, 5) #only runmean(alg=exact) gives correct result

Run the code above in your browser using DataLab