Learn R Programming

whatifbandit (version 0.3.0)

check_data: Checking for Valid Input Data

Description

Helper to validate_inputs(). This function accepts the data and checks whether it has Unique ID's whether the period length is valid.

Usage

check_data(
  data,
  data_cols,
  assignment_method,
  period_length,
  time_unit,
  perfect_assignment
)

Value

Throws an error if the data does not meet the specifications of the trial based on user settings.

Arguments

data

A data.frame, data.table, or tibble containing input data from the trial. This should be the results of a traditional Randomized Controlled Trial (RCT). Any data.frames will be converted to tibbles internally.

data_cols

A named character vector containing the names of columns in data as strings:

  • id_col: Column in data; contains unique ID as a key.

  • success_col: Column in data; binary successes from the original experiment.

  • condition_col: Column in data; original treatment condition for each observation.

  • date_col: Column in data; contains original date of event/trial. Only necessary when assigning by "Date". Must be of type Date, not a character string.

  • month_col: Column in data; contains month of treatment. Only necessary when time_unit = "Month", and when periods should be determined directly by the calendar months instead of month based time periods. This column can be a string/factor variable with the month names or numeric with the month number. It can easily be created from your date_col via lubridate::month(data[[date_col]]) or format(data[[date_col]], "%m").

  • success_date_col: Column in data; contains original dates each success occurred. Only necessary when perfect_assignment = FALSE. Must be of type Date, not a character string.

  • assignment_date_col: Column in data; contains original dates treatments were assigned to observations. Only necessary when perfect_assignment = FALSE. Used to simulate imperfect information on the part of researchers conducting an adaptive trial. Must be of type Date, not a character string.

assignment_method

A character string; one of "date", "batch", or "individual", to define the assignment into treatment waves. When using "batch" or "individual", ensure your dataset is pre-arranged in the proper order observations should be considered so that groups are assigned correctly. For "date", observations will be considered in chronological order. "individual" assignment can be computationally intensive for larger datasets.

period_length

A numeric value of length 1; represents the length of each treatment period. If assignment method is "date", this length refers the number of units specified in time_unit (i.e., if "day", 10 would be 10 days). If assignment method is "batch", this refers to the number of people in each batch.

time_unit

A character string specifying the unit of time for assigning periods when assignment_method is "date". Acceptable values are "day", "week", or "month". "month" does not require an additional column with the months of each observation, but it can accept a separate month_col. If month_col is specified, the periods follow the calendar months strictly, and when it is not specified months are simply used as the time interval. For example if a dataset has dates starting on July 26th, under month based assignment and a specified month_col the dates July 26th and August 3st would be in different periods, but if the month_col was not specified, they would be in the same period because the dates are less than one month apart.

perfect_assignment

Logical; if TRUE, assumes perfect information for treatment assignment (i.e., all outcomes are observed regardless of the date). If FALSE, hides outcomes not yet theoretically observed, based on the dates treatments would have been assigned for each wave. This is useful when simulating batch-based assignment where treatments were assigned on a given day whether or not all the information from a prior batch was available and you have exact dates treatments were assigned.