roll_lag: Fast rolling grouped lags and differences

Description

Inspired by 'collapse', roll_lag and roll_diff operate similarly to flag and fdiff.

Usage

roll_lag(x, n = 1L, ...)
# S3 method for default
roll_lag(x, n = 1L, g = NULL, fill = NULL, ...)
# S3 method for ts
roll_lag(x, n = 1L, g = NULL, fill = NULL, ...)
# S3 method for zoo
roll_lag(x, n = 1L, g = NULL, fill = NULL, ...)
roll_diff(x, n = 1L, ...)
# S3 method for default
roll_diff(x, n = 1L, g = NULL, fill = NULL, differences = 1L, ...)
# S3 method for ts
roll_diff(x, n = 1L, g = NULL, fill = NULL, differences = 1L, ...)
# S3 method for zoo
roll_diff(x, n = 1L, g = NULL, fill = NULL, differences = 1L, ...)
diff_(
  x,
  n = 1L,
  differences = 1L,
  order = NULL,
  run_lengths = NULL,
  fill = NULL
)

Value

A vector the same length as x.

Arguments

x: A vector or data frame.
n: Lag. This will be recycled to match the length of x and can be negative.
...: Arguments passed onto appropriate method.
g: Grouping vector. This can be a vector, data frame or GRP object.
fill: Value to fill the first n elements.
differences: Number indicating the number of times to recursively apply the differencing algorithm. If length(n) == 1, i.e the lag is a scalar integer, an optimised method is used which avoids recursion entirely. If length(n) != 1 then simply recursion is used.
order: Optionally specify an ordering with which to apply the lags/differences. This is useful for example when applying lags chronologically using an unsorted time variable.
run_lengths: Optional integer vector of run lengths that defines the size of each lag run. For example, supplying c(5, 5) applies lags to the first 5 elements and then essentially resets the bounds and applies lags to the next 5 elements as if they were an entirely separate and standalone vector.
This is particularly useful in conjunction with the order argument to perform a by-group lag.

Details

While these may not be as fast the 'collapse' equivalents, they are adequately fast and efficient.
A key difference between roll_lag and flag is that g does not need to be sorted for the result to be correct.
Furthermore, a vector of lags can be supplied for a custom rolling lag.

roll_diff() silently returns NA when there is integer overflow. Both roll_lag() and roll_diff() apply recursively to list elements.

Examples

Run this code

library(timeplyr)
# \dontshow{
.n_dt_threads <- data.table::getDTthreads()
.n_collapse_threads <- collapse::get_collapse()$nthreads
data.table::setDTthreads(threads = 1L)
collapse::set_collapse(nthreads = 1L)
# }
x <- 1:10

roll_lag(x) # Lag
roll_lag(x, -1) # Lead
roll_diff(x) # Lag diff
roll_diff(x, -1) # Lead diff

# Using cheapr::lag_sequence()
# Differences lagged at 5, first 5 differences are compared to x[1]
roll_diff(x, cheapr::lag_sequence(length(x), 5, partial = TRUE))

# Like diff() but x/y instead of x-y
quotient <- function(x, n = 1L){
  x / roll_lag(x, n)
}
# People often call this a growth rate
# but it's just a percentage difference
# See ?roll_growth_rate for growth rate calculations
quotient(1:10)
# \dontshow{
data.table::setDTthreads(threads = .n_dt_threads)
collapse::set_collapse(nthreads = .n_collapse_threads)
# }

Run the code above in your browser using DataLab