Learn R Programming

sparklyr.flint (version 0.2.2)

asof_join: Temporal join

Description

Perform left-outer join on 2 `TimeSeriesRDD`s based on inexact timestamp matches

Usage

asof_join(
  left,
  right,
  tol = "0ms",
  direction = c(">=", "

Arguments

left

The left `TimeSeriesRDD`

right

The right `TimeSeriesRDD`

tol

A character vector specifying a time duration (e.g., "0ns", "5ms", "5s", "1d", etc) as the tolerance for absolute difference in timestamp values between each record from `left` and its matching record from `right`. By default, `tol` is "0ns", which means a record from `left` will only be matched with a record from `right` if both contain the exact same timestamps.

direction

Specifies the temporal direction of the join, must be one of ">=", "<=", or "<". If direction is ">=", then each record from `left` with timestamp `tl` gets joined with a record from `right` having the largest/most recent timestamp `tr` such that `tl` >= `tr` and `tl` - `tr` <= `tol` (or equivalently, 0 <= `tl` - `tr` <= `tol`). If direction is "<=", then each record from `left` with timestamp `tl` gets joined with a record from `right` having the smallest/least recent timestamp `tr` such that `tl` <= `tr` and `tr` - `tl` <= `tol` (or equivalently, `0 <= `tr` - `tl` <= `tol`). If direction is "<", then each record from `left` with timestamp `tl` gets joined with a record from `right` having the smallest/least recent timestamp `tr` such that `tr` > `tl` and `tr` - `tl` <= `tol` (or equivalently, 0 < `tr` - `tl` <= `tol`).

key_columns

Columns to be used as the matching key among records from `left` and `right`: if non-empty, then in addition to matching criteria imposed by timestamps, a record from `left` will only match one from the `right` only if they also have equal values in all key columns.

left_prefix

A string to prepend to all columns from `left` after the join (usually for disambiguation purposes if `left` and `right` contain overlapping column names).

right_prefix

A string to prepend to all columns from `right` after the join (usually for disambiguation purposes if `left` and `right` contain overlapping column names).

See Also

Other Temporal join functions: asof_future_left_join(), asof_left_join()

Examples

Run this code
# NOT RUN {
library(sparklyr)
library(sparklyr.flint)

sc <- try_spark_connect(master = "local")
if (!is.null(sc)) {
  ts_1 <- copy_to(sc, tibble::tibble(t = seq(10), u = seq(10))) %>%
    from_sdf(is_sorted = TRUE, time_unit = "SECONDS", time_column = "t")
  ts_2 <- copy_to(sc, tibble::tibble(t = seq(10) + 1, v = seq(10) + 1L)) %>%
    from_sdf(is_sorted = TRUE, time_unit = "SECONDS", time_column = "t")
  future_left_join_ts <- asof_join(ts_1, ts_2, tol = "1s", direction = "<=")
} else {
  message("Unable to establish a Spark connection!")
}

# }

Run the code above in your browser using DataLab