pulse_split: (`STEP 2`) Split `pulse_data` across sequential time windows

Description

step 1 -- pulse_read()
-->> step 2 -- pulse_split() <<--
step 3 -- pulse_optimize()
step 4 -- pulse_heart()
step 5 -- pulse_doublecheck()
step 6 -- pulse_choose_keep()

After all raw PULSE data has been imported, the dataset must be split across sequential time windows.

pulse_split() takes the output from a call to pulse_read() and splits data across user-defined time windows. The output of pulse_split() can be immediately passed to pulse_heart(), or first optimized with pulse_optimize() and only then passed to pulse_heart() (highly recommended).

Usage

pulse_split(
  pulse_data,
  window_width_secs = 30,
  window_shift_secs = 60,
  min_data_points = 0.8,
  subset = 0,
  subset_seed = NULL,
  subset_reindex = FALSE,
  msg = TRUE
)

Value

A tibble with three columns. Column $i stores the order of each time window. Column $smoothed is a logical vector flagging smoothed data (at this point defaulting to FALSE, but later if pulse_optimize is used, values can change to TRUE. Column $data is a list with all the valid time windows (i.e., complying with min_data_points), each window being a subset of pulse_data (a tibble with at least 2 columns (time + one or more channels) containing PULSE data with timestamps within that time window)

Arguments

pulse_data: the output from a call to pulse_read().
window_width_secs: numeric, in seconds, defaults to 30; the width of the time windows over which heart rate frequency will be computed.
window_shift_secs: numeric, in seconds, defaults to 60; by how much each subsequent window is shifted from the preceding one.
min_data_points: numeric, defaults to 0.8; decimal from 0 to 1, used as a threshold to discard incomplete windows where data is missing (e.g., if the sampling frequency is 20 and window_width_secs = 30, each window should include 600 data points, and so if min_data_points = 0.8, windows with less than 600 * 0.8 = 480 data points will be rejected).
subset: numerical, defaults to 0; the number of time windows to keep from the entire dataset (or the number of entries to reject if set to a negative value); smaller subsets make the entire processing quicker and facilitate the execution of trial runs to optimize parameter selection before processing the entire dataset.
subset_seed: numerical, defaults to NULL; only used if subset is different from 0; subset_seed controls the seed used when extracting a subset of the available data; if set to NULL, a random seed is selected, resulting in rows being selected randomly; alternativelly, the user can set a specific seed in order to always select the same rows (important when the goal is to compare the impact of different parameter combinations using the exact same data points).
subset_reindex: logical, defaults to FALSE; only used if subset is different from 0; after extracting a subset of the available data, should rows be re-indexed (i.e., .$i made fully sequential); re-indexed rows make using pulse_plot_raw() easier, but row identity doesn't match anymore with row identity before subsetting.
msg: logical, defaults to TRUE; should non-crucial messages (but not errors) be shown (mostly for use from within the wrapper function PULSE(), where it is set to FALSE to avoid the repetition of identical messages)

Window <code>width</code> and <code>shift</code>

A good starting point for window_width_secs is to set it to between 30 and 60 seconds.

As a rule of thumb, use lower values for data collected from animals with naturally faster heart rates and/or that have been subjected to treatments conducive to fast heart rates still (e.g., thermal performance ramps). In such cases, lower values will result in higher temporal resolution, which may be crucial if experimental conditions are changing rapidly. Conversely, experiments using animals with naturally slower heart rates and/or subjected to treatments that may cause heart rates to stabilize or even slow (e.g., control or cold treatments) may require the use of higher values for window_width_secs, as the resulting windows should encompass no less than 5-7 heartbeat cycles.

As for window_shift_secs, set it to a value:

smaller than window_width_secs if overlap between windows is desired (not usually recommended) (if window_width_secs = 30 and window_shift_secs = 15, the first 3 windows will go from [0, 30), [15, 45) and [30, 60))
equal to window_width_secs to process all data available (if window_width_secs = 30 and window_shift_secs = 30, the first 3 windows will go from [0, 30), [30, 60) and [60, 90))
larger than window_width_secs to skip data (ideal for speeding up the processing of large datasets) (if window_width_secs = 30 and window_shift_secs = 60, the first 3 windows will go from [0, 30), [60, 90) and [120, 150))

In addition, also consider that lower values for the window_... parameters may lead to oversampling and a cascade of statistical issues, the resolution of which may end up negating any advantage gained.

Handling gaps in the dataset

min_data_points shouldn't be set too low, otherwise only nearly empty windows will be rejected.

Examples

Run this code

## Begin prepare data ----
pulse_data_sub <- pulse_data
pulse_data_sub$data <- pulse_data_sub$data[,1:5]
## End prepare data ----

pulse_split(pulse_data_sub)

Run the code above in your browser using DataLab