pulse_optimize: (`STEP 3`) Optimize PULSE data through interpolation and smoothing

Description

step 1 -- pulse_read()
step 2 -- pulse_split()
-->> step 3 -- pulse_optimize() <<--
step 4 -- pulse_heart()
step 5 -- pulse_doublecheck()
step 6 -- pulse_choose_keep()

IMPORTANT NOTE: pulse_optimize() can be skipped, but that is highly discouraged.

The performance of the algorithm employed in the downstream function pulse_heart() for the detection of heartbeat wave crests depends significantly on (i) there being a sufficient number of data points around each crest and (ii) the data not being too noisy. pulse_optimize() uses first pulse_interpolate() and then pulse_smooth() to reshape the data and improve the likelihood of pulse_heart() successfully estimating the inherent heartbeat rates.

INTERPOLATION is highly recommended because tests on real data have shown that a frequency of at least 40 Hz is crucial to ensure wave crests can be discerned even when the underlying heartbeat rate is high (i.e., at rates above 2-3 Hz). Since the PULSE multi-channel system is not designed to capture data at such high rates (partially because it would generate files unnecessarily large), pulse_interpolate() is used instead to artificially increase the temporal resolution of the data by linearly interpolating to the target frequency. It is important to note that this process DOES NOT ALTER the shape of the heart beat wave, it just introduces intermediary data points. Also, the only downside to using very high values for interpolation_freq is the proportional increase in computing time and size of the outputs together with minimal improvements in the performance of pulse_heart() - but no artefacts are expected.
SMOOTHING should be experimented with when pulse_heart() produces too many heartbeat rate estimates that are clearly incorrect. In such situations, pulse_smooth() applies a smoothing filter (normal Kernel Regression Smoother) to the data to smooth out high-frequency noise and render a more sinusoidal wave, which is easier to handle. Unlike interpolation_freq, users should exercise caution when setting bandwidth and generally opt for lower values, as there's a threshold to bandwidth values above which the resulting smoothed pulse data becomes completely unrelated to the original data, and the subsequent heartbeat rates computed with pulse_heart() may be wrong. Always double-check the data after applying a stronger smoothing. Nonetheless, note that if applied with the default bandwidth, smoothing incurs no penalty and hardly changes the data - so it isn't worth going out of the way to not apply smoothing.

Usage

pulse_optimize(
  pulse_data_split,
  interpolation_freq = 40,
  bandwidth = 0.2,
  raw_v_smoothed = FALSE,
  multi
)

Value

The same structure as the input data, which is a tibble with three columns, but now with the values on column $smoothed switched to TRUE if smoothing was applied and the contents of column $data modified in accordance with the parameters called. If raw_v_smoothed is FALSE, the tibble returned will have the same number of rows as the input tibble. If raw_v_smoothed is TRUE, the tibble returned will have twice the number of rows as the input tibble, with half not smoothed (i.e., only interpolation applied), the other half smoothed (i.e., interpolation and smoothing applied) and the order indexes in $i duplicated. Downstream functions will process both types of output automatically.

Arguments

pulse_data_split: the output from a call to pulse_split()
interpolation_freq: numeric, defautls to 40; value expressing the frequency (in Hz) to which PULSE data should be interpolated. Can be set to 0 (zero) or any value equal or greater than 40 (the default). If set to zero, no interpolation is performed.
bandwidth: numeric, defaults to 0.2; the bandwidth for the Kernel Regression Smoother. If equal to 0 (zero) no smoothing is applied. Normally kept low (0.1 - 0.3) so that only very high frequency noise is removed, but can be pushed up all the way to 1 or above (especially when the heartbeat rate is expected to be slow, as is typical of oysters, but double check the resulting data). Type ?ksmooth for additional info.
raw_v_smoothed: logical, defaults to FALSE; if set to FALSE, the output includes only one list obtained after applying interpolation and smoothing according to the values set. If set to TRUE, a list with two lists is returned, one after applying only interpolation (i.e., "raw"), and the other after applying both interpolation and smoothing (i.e, "smoothed").
multi: logical; was the data generated by a multi-channel system (TRUE) or a one-channel system (FALSE)?

Raw v smoothed

When raw_v_smoothed is set to TRUE, two heart rate estimates are produced for each data point - one based on the raw data and another after applying smoothing. The cost is an increase in processing time. The benefit is an improvement in the ability to estimate heart rates when the heart is beating faster, as in those cases smoothing the data may become counterproductive. When raw_v_smoothed = TRUE, pulse_choose_keep() will decide which of the estimates to retain for each data point (based on user-defined parameters).

Examples

Run this code

## Begin prepare data ----
pulse_data_sub <- pulse_data
pulse_data_sub$data <- pulse_data_sub$data[,1:5]
pulse_data_split <- pulse_split(pulse_data_sub)
## End prepare data ----

# Optimize data by interpolating to 40 Hz and applying a slight smoothing
pulse_optimize(pulse_data_split, 40, 0.2, multi = pulse_data$multi)