Learn R Programming

heartbeatr (version 1.0.0)

PULSE_by_chunks: Process PULSE data file by file (STEPS 1-6)

Description

This function runs PULSE() file by file, instead of attempting to read all files at once. This is required when datasets are too large (more than 20-30 files), as otherwise the system may become stuck due to the amount of data that needs to be kept in the memory. Because the results of processing data for each hourly file in the dataset are saved to a job_folder, PULSE_by_chunks() has the added benefit of allowing the entire job to be stopped and resumed, facilitating the advance in the processing even if a crash occurs.

Usage

PULSE_by_chunks(
  folder,
  allow_dir_create = FALSE,
  chunks = 2,
  bind_data = TRUE,
  window_width_secs = 30,
  window_shift_secs = 60,
  min_data_points = 0.8,
  interpolation_freq = 40,
  bandwidth = 0.2,
  doublecheck = TRUE,
  lim_n = 3,
  lim_sd = 0.75,
  raw_v_smoothed = TRUE,
  correct = TRUE,
  discard_channels = NULL,
  keep_raw_data = TRUE,
  show_progress = TRUE
)

Value

A tibble with nrows = (number of channels) * (number of windows in pulse_data_split) and 13 columns:

  • i, the order of each time window

  • smoothed, logical flagging smoothed data

  • id, PULSE channel IDs

  • time, time at the center of each time window

  • data, a list of tibbles with raw PULSE data for each combination of channel and window, with columns time, val and peak (TRUE in rows corresponding to wave peaks)

  • hz, heartbeat rate estimate (in Hz)

  • n, number of wave peaks identified

  • sd, standard deviation of the intervals between wave peaks

  • ci, confidence interval (hz ± ci)

  • keep, logical indicating whether data points meet N and SD criteria

  • d_r, ratio of consecutive asymmetric peaks

  • d_f, logical flagging data points where heart beat frequency is likely double the real value

Arguments

folder

the path to a folder where several PULSE files are stored

allow_dir_create

logical, defaults to FALSE. Only when set to TRUE does PULSE_by_chunks() actually do anything. This is to force the user to accept that a job_folder will be created inside of the folder supplied - without this folder PULSE_by_chunks() cannot operate. It is STRONGLY advised to maintain a copy of the dataset being processed to avoid any inadvertent data loss. By setting allow_dir_create to TRUEthe user is taking responsibility for the management of their files.

chunks

numeric, defaults to 2. Corresponds to the number of files processed at once during each for cycle; higher numbers result in a quicker and more efficient operation, but shouldn't be set too high, as otherwise the system may become overwhelmed once more (which is what PULSE_by_chunks() is designed to avoid).

bind_data

logical, defaults to TRUE. If set to TRUE, after processing all chunks, PULSE_by_chunks() will try to read all files in the job_folder and return a single unified tibble with all data. Please be aware that there's a possibility that if the dataset is very large, the machine may become overwhelmed and crash due to lack of memory (still, all files stored in the job_folder will remain intact, and code may be written to analyze data also in chunks). If set to FALSE, PULSE_by_chunks() will return nothing after completing the processing of all files in the dataset, and the user must instead manually handle the reading and collating of all processed data in the job_folder.

window_width_secs

numeric, in seconds, defaults to 30; the width of the time windows over which heart rate frequency will be computed.

window_shift_secs

numeric, in seconds, defaults to 60; by how much each subsequent window is shifted from the preceding one.

min_data_points

numeric, defaults to 0.8; decimal from 0 to 1, used as a threshold to discard incomplete windows where data is missing (e.g., if the sampling frequency is 20 and window_width_secs = 30, each window should include 600 data points, and so if min_data_points = 0.8, windows with less than 600 * 0.8 = 480 data points will be rejected).

interpolation_freq

numeric, defautls to 40; value expressing the frequency (in Hz) to which PULSE data should be interpolated. Can be set to 0 (zero) or any value equal or greater than 40 (the default). If set to zero, no interpolation is performed.

bandwidth

numeric, defaults to 0.2; the bandwidth for the Kernel Regression Smoother. If equal to 0 (zero) no smoothing is applied. Normally kept low (0.1 - 0.3) so that only very high frequency noise is removed, but can be pushed up all the way to 1 or above (especially when the heartbeat rate is expected to be slow, as is typical of oysters, but double check the resulting data). Type ?ksmooth for additional info.

doublecheck

logical, defaults to TRUE; should pulse_doublecheck() be used? (it is rare, but there are instances when it should be disabled).

lim_n

numeric, defaults to 3; minimum number of peaks detected in each time window for it to be considered a "keep".

lim_sd

numeric, defaults to 0.75; maximum value for the sd of the time intervals between each peak detected for it to be considered a "keep"

raw_v_smoothed

logical, defaults to TRUE; indicates whether or not to also compute heart rates before applying smoothing; this will increase the quality of the output but also double the processing time.

correct

logical, defaults to TRUE; if FALSE, data points with hz values likely double the real value are flagged BUT NOT CORRECTED. If TRUE, hz (as well as data, n, sd and ci) are corrected accordingly. Note that the correction is not reversible!

discard_channels

character vectors, containing the names of channels to be discarded from the analysis. discard_channels is forced to lowercase, but other than that, the exact names must be provided. Discarding unused channels can greatly speed the workflow!

keep_raw_data

logical, defaults to TRUE; If set to FALSE, $data is set to FALSE (i.e., raw data is discarded), dramatically reducing the amount of disk space required to store the final output (usually, by two orders of magnitude). HOWEVER, note that it won't be possible to use pulse_plot_raw() anymore!

show_progress

logical, defaults to FALSE. If set to TRUE, progress messages will be provided.

See Also

  • PULSE() for all the relevant information about the the processing of PULSE data

Examples

Run this code
##

Run the code above in your browser using DataLab