runner
a R package for running operations.
About
Package contains standard running functions (aka. windowed, rolling, cumulative) with additional options. runner
provides extended functionality like date windows, handling missings and varying window size. runner
brings also rolling streak and rollin which, what extends beyond range of functions already implemented in R packages.
Installation
Install package from from github or from CRAN.
# devtools::install_github("gogonzo/runner")
install.packages("runner")
Using runner
The main idea of the package is to provide running operations on R vectors. Running functions are these which are applied to all elements up to actual one. For example implemented already in base
cumsum
, cummin
etc. Functions provided in this package works similar but with extended functionality such as handling NA
, custom window size. The most functions provided in package are based on the same logic:
fixed window size
k
denotes number of elements in window. For fixed window size k
should be single value.
Varying window size
For varying window size k
should be of the same length as x
. Varying window size means that for element [i]
function will be calculated on window of length k[i]
.
Windows depending on date
Sometimes data points in dataset are not equally spaced (missing weeekends, holidays, other missings) and thus window size should vary to keep expected time frame. If one specifies idx
argument, than running functions are applied on windows depending on date. idx
should be the same length as x
of class Date
or integer
. Including idx
can be combined with varying window size, than k will denote number of periods in window different for each data point.
Other options
na_rm = TRUE
handling missing and is equivalent tona.rm
.
na_pad=T
if window size exceeds number of available elements, than firstk-1
elements are filled withNA
.
which
used with running index, which value ('first' or 'last')
Running aggregations (mean|sum|min|max)_run
Runner provides basic aggregation methods calculated within running windows. Below example showing some functions behavior for different arguments setup. Let's take a look at 8th element of a vector on which min_run
is calculated. First setup uses default values, so algorithm is looking for minimum value in all elements before actual (i=8). By default missing values are removed before calculations by argument na_rm=TRUE
, and also window is not specified. The default is equivalent of base::cummin
with additional option to ignore NA
values. In second example within window k=5, the lowest value is -3. In the last example minimum is not available due to existence of NA
. Graphical example is reproduced below in the code.
library(runner)
library(magrittr)
x <- c(1,-5,1,-3,NA,NA,NA,1,-1,NA,-2,3)
k <- c(4,5,2,5,4,4,2,2,4,4,3,1)
idx <- c(1,3,4,6,7,10,13,16,19,21,23,26)
a0 <- cummin(x)
a1 <- min_run(x, na_rm = TRUE)
a2 <- min_run(x, k=5, na_rm = TRUE)
a3 <- min_run(x, na_rm = FALSE)
a4 <- min_run(x, k=k, na_rm = TRUE, na_pad = TRUE)
a5 <- min_run(x, k=5, idx=idx)
data.frame(idx, x, a0, a1, a2, a3, a4, a5)
#> idx x a0 a1 a2 a3 a4 a5
#> 1 1 1 1 1 1 1 NA 1
#> 2 3 -5 -5 -5 -5 -5 NA -5
#> 3 4 1 -5 -5 -5 -5 NA -5
#> 4 6 -3 -5 -5 -5 -5 -5 -5
#> 5 7 NA NA -5 -5 NA -5 -5
#> 6 10 NA NA -5 -5 NA -3 -3
#> 7 13 NA NA -5 -3 NA NA Inf
#> 8 16 1 NA -5 -3 NA 1 1
#> 9 19 -1 NA -5 -1 NA -1 -1
#> 10 21 NA NA -5 -1 NA -1 -1
#> 11 23 -2 NA -5 -2 NA -2 -2
#> 12 26 3 NA -5 -2 NA 3 -2
fill_run
Function used to replace NA
with previous non-NA element. To understand how fill_run
works, take a look on illustration. Row 'x' represents, and another rows represent replaced NA
by fill_run with different options setup (run_for_first=TRUE and only_within=TRUE respectively). By default, fill_run
replaces all NA
if they were preceded by any value. If NA
appeared in the beginning of the vector then it would not be replaced. But if user specify run_for_first=TRUE
initial empty values values will be replaced by next non-empty value. Option only_within=TRUE
means that NA
values would be replaced if they were surrounded by pair of identical values. No windows provided in this functionality.
x <- c(NA, NA, "b","b","a",NA,NA,"a","b",NA,"a","b")
data.frame(x,
f1 = fill_run(x),
f2 = fill_run(x,run_for_first = T),
f3 = fill_run(x, only_within = T))
#> x f1 f2 f3
#> 1 <NA> <NA> b <NA>
#> 2 <NA> <NA> b <NA>
#> 3 b b b b
#> 4 b b b b
#> 5 a a a a
#> 6 <NA> a a a
#> 7 <NA> a a a
#> 8 a a a a
#> 9 b b b b
#> 10 <NA> b b <NA>
#> 11 a a a a
#> 12 b b b b
Running streak
To count consecutive elements in specified window one can use streak_run
. Following figure illustrates how streak is calculated with three different options setup for 9th element of the input vector x
. First shows default configuration, with full window and na_rm=T
. Second example count within k=4 window with count reset on NA
. Last example counting streak with continuation after NA
. Visualization also supported with corresponding R code.
x <- c("A","B","A","A","B","B","B",NA,"B","A","B")
data.frame(
x,
s0 = streak_run(x),
s1 = streak_run(x, na_rm=F, k=3),
s2 = streak_run(x, k=4) )
#> x s0 s1 s2
#> 1 A 1 1 1
#> 2 B 1 1 1
#> 3 A 1 1 1
#> 4 A 2 2 2
#> 5 B 1 1 1
#> 6 B 2 2 2
#> 7 B 3 3 3
#> 8 <NA> 3 NA 3
#> 9 B 4 1 3
#> 10 A 1 1 1
#> 11 B 1 1 1
Running which
To obtain index number of element satisfying some condition in window, one can use running which. Functions from this group returns index while condition is TRUE
appeared before n-th element of a vector. In below example whicht_run
function returns index of element containing TRUE
value. If na_rm=T
is specified, missing is treated as FALSE
, and is ignored while searching for TRUE
. While user set na_rm=F
like in second example, function returns NA
, because in following window TRUE
appears after missing and it's impossible to be certain which is first (missing is an element of unknown value - could be TRUE or FALSE).
x <- c(T,T,T,F,NA,T,F,NA,T,F,T,F)
data.frame(
x,
s0 = whicht_run(x, which="first"),
s1 = whicht_run(x, na_rm=F, k=5, which="first"),
s2 = whicht_run(x, k=5,"first"))
#> x s0 s1 s2
#> 1 TRUE 1 1 1
#> 2 TRUE 1 1 1
#> 3 TRUE 1 1 1
#> 4 FALSE 1 1 1
#> 5 NA 1 1 1
#> 6 TRUE 1 2 2
#> 7 FALSE 1 3 3
#> 8 NA 1 NA 6
#> 9 TRUE 1 NA 6
#> 10 FALSE 1 6 6
#> 11 TRUE 1 NA 9
#> 12 FALSE 1 NA 9
Unique elements in window
User can use unique_run
create list of unique elements within specified window size.
x2 <- sample( letters[1:3], 6, replace=TRUE)
x2
#> [1] "c" "c" "c" "c" "c" "c"
unique_run( x=x2, k = 3 )
#> [[1]]
#> [1] "c"
#>
#> [[2]]
#> [1] "c"
#>
#> [[3]]
#> [1] "c"
#>
#> [[4]]
#> [1] "c"
#>
#> [[5]]
#> [1] "c"
#>
#> [[6]]
#> [1] "c"
Apply custom function
Own runner
One can use own function with function runner
which will be applied in the same way as other build-in functions.
Below example of using base::mean
inside of the runner
function.
x <- runif(15)
k <- sample(1:15, 15, replace = TRUE)
idx <- cumsum(sample(c(1,2,3,4), 15, replace=T))
# simple call
simple_mean <- runner(x = x, k = 4, f = mean)
# additional arguments for mean
trimmed_mean <- runner(x = x, k = 4, f = function(x) mean(x, trim = 0.05))
# varying window size
varying_window <- runner(x = x, k = k, f = function(x) mean(x, trim = 0.05))
# date windows
date_windows <- runner(x = x, k = k, idx = idx, f = function(x) mean(x, trim = 0.05))
data.frame(x, k, idx, simple_mean, trimmed_mean, varying_window, date_windows)
#> x k idx simple_mean trimmed_mean varying_window date_windows
#> 1 0.9045468 14 1 0.9045468 0.9045468 0.9045468 0.9045468
#> 2 0.3912330 4 5 0.6478899 0.6478899 0.6478899 0.3912330
#> 3 0.1205684 7 9 0.4721161 0.4721161 0.4721161 0.2559007
#> 4 0.7689856 7 10 0.5463335 0.5463335 0.5463335 0.4269290
#> 5 0.8541224 8 13 0.5337274 0.5337274 0.6078912 0.5812255
#> 6 0.8253508 6 14 0.6422568 0.6422568 0.6441345 0.6422568
#> 7 0.4720555 1 18 0.7301286 0.7301286 0.4720555 0.4720555
#> 8 0.8347993 3 21 0.7465820 0.7465820 0.7107352 0.8347993
#> 9 0.6623278 6 23 0.6986334 0.6986334 0.7362736 0.6563942
#> 10 0.6543098 1 26 0.6558731 0.6558731 0.6543098 0.6543098
#> 11 0.1847041 2 28 0.5840353 0.5840353 0.4195069 0.1847041
#> 12 0.1445083 13 31 0.4114625 0.4114625 0.5681260 0.4961299
#> 13 0.5277289 7 34 0.3778128 0.3778128 0.4972048 0.2856471
#> 14 0.7697875 12 35 0.4066822 0.4066822 0.5682707 0.4562077
#> 15 0.9685401 5 39 0.6026412 0.6026412 0.5190538 0.8691638
Creating windows
Function creates list of windows. Because runner
provide limited functionality, one can create running-window-list which can be further processed by user to obtain desired statistic (eg. window sum). x
is a vector to be 'run on' and k
is a length of window. In this example window length is varying as specified by k
. Provide one value to obtain constant window size.
set.seed(11)
window_run(x = 1:5, k = c(1,2,3,3,2) )
#> [[1]]
#> [1] 1
#>
#> [[2]]
#> [1] 1 2
#>
#> [[3]]
#> [1] 1 2 3
#>
#> [[4]]
#> [1] 2 3 4
#>
#> [[5]]
#> [1] 4 5
Such windows can be used in further calculations, with any R function. Example below shows how to obtain running sum
in specified, varying window length (specified by k
).
window_run( x= 1:5, k = c(1,2,3,3,2) ) %>%
lapply(sum) %>%
unlist
#> [1] 1 3 6 9 9
One can also specify window based on a date of other numeric index. To do this date should be passed via idx
argument and k should be integer denoting date-window span.
window_run( x = 1:5, k = 3, idx = c(1,2,5,6,7) )
#> [[1]]
#> [1] 1
#>
#> [[2]]
#> [1] 1 2
#>
#> [[3]]
#> [1] 3
#>
#> [[4]]
#> [1] 3 4
#>
#> [[5]]
#> [1] 3 4 5