roll: Rolling functions

Description

Fast rolling functions to calculate aggregates on sliding window. Function name and arguments are experimental.

Usage

frollmean(x, n, fill=NA, algo=c("fast", "exact"), align=c("right",
  "left", "center"), na.rm=FALSE, hasNA=NA, adaptive=FALSE,
  verbose=getOption("datatable.verbose"))

Arguments

vector, list, data.frame or data.table of numeric columns.

integer vector, for adaptive rolling function also list of integer vectors, rolling window size.

fill

numeric, value to pad by. Defaults to NA.

algo

character, default "fast". When set to "exact", then slower algorithm is used. It suffers less from floating point rounding error, performs extra pass to adjust rounding error correction and carefully handles all non-finite values. If available it will use multiple cores. See details for more information.

align

character, define if rolling window covers preceding rows ("right"), following rows ("left") or centered ("center"). Defaults to "right".

na.rm

logical. Should missing values be removed when calculating window? Defaults to FALSE. For details on handling other non-finite values, see details below.

hasNA

logical. If it is known that x contains NA then setting to TRUE will speed up. Defaults to NA.

adaptive

logical, should adaptive rolling function be calculated, default FALSE. See details below.

verbose

logical, default getOption("datatable.verbose"), TRUE turns on status and information messages to the console, it also disable parallel processing.

Value

A list except when the input is a vector and length(n)==1 in which case a vector is returned.

Details

froll* functions accepts vectors, lists, data.frames or data.tables. They always return a list except when the input is a vector and length(n)==1 in which case a vector is returned, for convenience. Thus rolling functions can be used conveniently within data.table syntax.

Argument n allows multiple values to apply rolling functions on multiple window sizes. If adaptive=TRUE, then it expects a list. Each list element must be integer vector of window sizes corresponding to every single observation in each column.

When algo="fast" is used then any NaN, +Inf, -Inf is treated as NA. Setting algo="exact" will make rolling functions to use compute-intensive algorithm that suffers less from floating point rounding error. It will additionally make extra pass to perform floating point error correction. It also handles NaN, +Inf, -Inf consistently to base R.

Adaptive rolling functions are special cases where for each single observation has own corresponding rolling window width. Due to the logic of adaptive rolling functions, following restrictions apply:

align only "right".
if list of vectors is passed to x, then all list vectors must have equal length.

When multiple columns or multiple windows width are provided, then they are run in parallel. Eventually nested parallelism occurs when algo="exact", see examples.

References

Round-off error

Examples

Run this code

# NOT RUN {
d = as.data.table(list(1:6/2, 3:8/4))
# rollmean of single vector and single window
frollmean(d[, V1], 3)
# multiple columns at once
frollmean(d, 3)
# multiple windows at once
frollmean(d[, .(V1)], c(3, 4))
# multiple columns and multiple windows at once
frollmean(d, c(3, 4))
## three calls above will use multiple cores when available

# partial window using adaptive rolling function
an = function(n, len) c(seq.int(n), rep(n, len-n))
n = an(3, nrow(d))
frollmean(d, n, adaptive=TRUE)

# performance vs exactness
set.seed(108)
x = sample(c(rnorm(1e3, 1e6, 5e5), 5e9, 5e-9))
n = 15
ma = function(x, n, na.rm=FALSE) {
  ans = rep(NA_real_, nx<-length(x))
  for (i in n:nx) ans[i] = mean(x[(i-n+1):i], na.rm=na.rm)
  ans
}
fastma = function(x, n, na.rm) {
  if (!missing(na.rm)) stop("NAs are unsupported, wrongly propagated by cumsum")
  cs = cumsum(x)
  scs = shift(cs, n)
  scs[n] = 0
  as.double((cs-scs)/n)
}
system.time(ans1<-ma(x, n))
system.time(ans2<-fastma(x, n))
system.time(ans3<-frollmean(x, n, algo="exact")) # parallel using openmp again
system.time(ans4<-frollmean(x, n))
anserr = list(
  froll_exact_f = ans4-ans1,
  froll_exact_t = ans3-ans1,
  fastma = ans2-ans1
)
errs = sapply(lapply(anserr, abs), sum, na.rm=TRUE)
sapply(errs, format, scientific=FALSE) # roundoff
# }

Run the code above in your browser using DataLab