pulse_summarise: Summarise PULSE heartbeat rate estimates over new time windows

Description

Take the output from PULSE() and summarise hz estimates over new user-defined time windows using FUN (a summary function). In effect, this procedure reduces the number of data points available over time.

Note that the output of pulse_summarise() can be inspected with pulse_plot() but not pulse_plot_raw().

Usage

pulse_summarise(
  heart_rates,
  FUN = stats::median,
  span_mins = 10,
  min_data_points = 2
)

Value

A similar tibble as the one provided for input, but fewer columns and rows. Among the columns now absent is the data column (raw data is no longer available). IMPORTANT NOTE: Despite retaining the same names, several columns present in the output now provide slightly different information (because they are recalculated for each summarizing window): time corresponds to the first time stamp of the summarizing window; n shows the number of valid original windows used by the summary function; sd represents the standard deviation of all heartbeat rate estimates within each summarizing window (and not the standard deviation of the intervals between each identified wave peak, as was the case in heart_rates); ci is the confidence interval of the new value for hz.

Arguments

heart_rates: the output from PULSE(), pulse_heart(), pulse_doublecheck() or pulse_choose_keep.
FUN: a custom function, defaults to median; Note that FUN must take a vector of numeric values and output a single numeric value.
span_mins: integer, in mins, defaults to 10; expresses the width of the new summary windows
min_data_points: numeric, defaults to 2; value indicating the minimum number of data points in each new summarizing window. Windows covering less data points are discarded. If set to 0 (zero), no window is ever discarded.

Details

The PULSE multi-channel system captures data continuously. When processing those data, users should aim to obtain estimates of heart beat frequency at a rate that conforms to their system's natural temporal variability, or risk running into oversampling (which has important statistical implications and must be avoided or explicitly handled).

With this in mind, users can follow two strategies:

If, for example, users are targeting 1 data point every 5 mins...

If the raw data is of good quality (i.e., minimal noise, signal wave with large amplitude), users can opt for a relatively narrow split_window (e.g, by setting window_width_secs in PULSE() (or pulse_split()) to 30 secs) and to only sample split_windows every 5 mins with window_shift_secs = 300. This means that data is processed in 5-mins split-windows where 30 secs of data are used and four and a half mins of data are skipped, yielding our target of 1 data point every 5 mins. Doing so will greatly speed up the processing of the data (less and smaller windows to work on), and the final output will immediately have the desired sample frequency. However, if any of the split_windows effectively analysed features a gap in the data or happens to coincide with the occasional drop in signal quality, those estimates of heartbeat rate will reflect that lack of quality (even if better data may be present in the four and a half mins of data that is being skipped). This strategy is usually used at the beginning to assess the dataset, and depending on the results, the more time-consuming strategy described next may have to be used instead.
If sufficient computing power is available and/or the raw data can't be guaranteed to be high quality from beginning to end, users can opt for a strategy that scans the entire dataset without skipping any data. This can be achieved by setting window_width_secs and window_shift_secs in PULSE() (or pulse_split()) to the same low value. In this case, if both parameters are set to 30 secs, processing will take significantly longer and each 5 mins of data will result in 10 data points. Then, pulse_summarise can be used with span_mins = 5 to summarise the data points back to the target sample frequency. More importantly, if the right summary function is used, this strategy can greatly reduce the negative impact of spurious bad readings. For example, setting FUN = median, will reduce the contribution of values of hz that deviate from the center ("wrong" values) to the final heartbeat estimate for a given time window). Thus, if the computational penalty is bearable, this more robust strategy can prove useful.

Examples

Run this code

## Begin prepare data ----
paths <- pulse_example()
heart_rates <- PULSE(
  paths,
  discard_channels = c(paste0("c0", c(1:7, 9)), "c10"),
  show_progress = FALSE
  )
## End prepare data ----

# Summarise heartbeat estimates (1 data point every 5 mins)
nrow(heart_rates) # == 13
summarised_5mins <- pulse_summarise(heart_rates, span_mins = 5)
nrow(summarised_5mins) # == 3
summarised_5mins

# using a custom function
pulse_summarise(heart_rates, span_mins = 5, FUN = function(x) quantile(x, 0.2))

# normalized data is supported automatically
pulse_summarise(pulse_normalize(heart_rates))

# Note that visualizing the output from 'plot_summarise()' with
#  'pulse_plot()' may result in many warnings
pulse_plot(summarised_5mins)
"> There were 44 warnings (use warnings() to see them)"

# That happens when the value chosen for 'span_mins' is such
#  that the output from 'plot_summarise()' doesn't contain
#  enough data points for the smoothing curve to be computed
# Alternatively, do one of the following:

# reduce 'span_mins' to still get enough data points
pulse_plot(pulse_summarise(heart_rates, span_mins = 2, min_data_points = 0))

# or disable the smoothing curve
pulse_plot(summarised_5mins, smooth = FALSE)