multiple_mab_simulation: Run Multiple Multi-Arm-Bandit Trials with Inference in Parallel

Description

Performs multiple Multi-Arm Bandit Trials using the same simulation and inference backend as single_mab_simulation(). Allows for easy execution of multiple trials under the same settings to gauge the variance of the procedure across execution states. Additionally supports parallel processing through the future and furrr packages.

Usage

multiple_mab_simulation(
  data,
  assignment_method,
  algorithm,
  prior_periods,
  perfect_assignment,
  whole_experiment,
  blocking,
  data_cols,
  times,
  seeds,
  control_augment = 0,
  random_assign_prop = 0,
  ndraws = 5000,
  control_condition = NULL,
  time_unit = NULL,
  period_length = NULL,
  block_cols = NULL,
  verbose = FALSE,
  check_args = TRUE,
  keep_data = FALSE
)

Value

An object of class multiple.mab, containing:

final_data_nest: A tibble or data.table containing the nested tibbles/data.tables from each trial. Only provided when keep_data is TRUE.
bandits: A tibble or data.table containing the UCB1 values or Thompson sampling posterior distributions for each period. Wide format, each row is a period, and each columns is a treatment. Each row in this table represents the calculation from the given period after its values were imputed, so row 2 represents the calculations made in period 3, but represent the impact of period 2's new assignments.
assignment_probs: A tibble or data.table containing the probability of being assigned each treatment arm at a given period. Wide format, each row is a period, and each columns is a treatment. Each row represents the probability of being assigned each treatment at each period, these have not been shifted like the bandits table.
estimates: A tibble or data.table containing the AIPW (Augmented Inverse Probability Weighting) treatment effect estimates and variances, and traditional sample means and variances, for each treatment arm. Long format, treatment arm, and estimate type are columns along with the mean and variance.
assignment_quantities: A tibble or data.table containing the number of units assigned to each treatment for each simulation in the set of repeated simulations.
settings: A named list of the configuration settings used in the trial.

Arguments

data

A data.frame, data.table, or tibble containing input data from the trial. This should be the results of a traditional Randomized Controlled Trial (RCT). Any data.frames will be converted to tibbles internally.

assignment_method

A character string; one of "date", "batch", or "individual", to define the assignment into treatment waves. When using "batch" or "individual", ensure your dataset is pre-arranged in the proper order observations should be considered so that groups are assigned correctly. For "date", observations will be considered in chronological order. "individual" assignment can be computationally intensive for larger datasets.

algorithm

A character string specifying the MAB algorithm to use. Options are "thompson" or "ucb1". Algorithm defines the adaptive assignment process. Mathematical details on these algorithms can be found in Kuleshov and Precup 2014 and Slivkins 2024.

prior_periods

A numeric value of length 1, or the character string "All"; number of previous periods to use in the treatment assignment model. This is used to implement the stationary/non-stationary bandit. For example, a non-stationary bandit assumes the true probability of success for each treatment changes over time, so to account for that, not all prior data should be used when making decisions because it could be "out of date".

perfect_assignment

Logical; if TRUE, assumes perfect information for treatment assignment (i.e., all outcomes are observed regardless of the date). If FALSE, hides outcomes not yet theoretically observed, based on the dates treatments would have been assigned for each wave. This is useful when simulating batch-based assignment where treatments were assigned on a given day whether or not all the information from a prior batch was available and you have exact dates treatments were assigned.

whole_experiment

Logical; if TRUE, uses all past experimental data for imputing outcomes. If FALSE, uses only data available up to the current period. In large datasets or with a high number of periods, setting this to FALSE can be more computationally intensive, though not a significant contributor to total run time.

blocking

Logical; whether or not to use treatment blocking. Treatment blocking is used to ensure an even-enough distribution of treatment conditions across blocks. For example, blocking by gender would mean the randomized assignment should split treatments evenly not just throughout the sample (so for 4 arms, 25-25-25-25), but also within each block, so 25% of men would receive each treatment and 25% of women the same.

data_cols

A named character vector containing the names of columns in data as strings:

id_col: Column in data; contains unique ID as a key.
success_col: Column in data; binary successes from the original experiment.
condition_col: Column in data; original treatment condition for each observation.
date_col: Column in data; contains original date of event/trial. Only necessary when assigning by "Date". Must be of type Date, not a character string.
month_col: Column in data; contains month of treatment. Only necessary when time_unit = "Month", and when periods should be determined directly by the calendar months instead of month based time periods. This column can be a string/factor variable with the month names or numeric with the month number. It can easily be created from your date_col via lubridate::month(data[[date_col]]) or format(data[[date_col]], "%m").
success_date_col: Column in data; contains original dates each success occurred. Only necessary when perfect_assignment = FALSE. Must be of type Date, not a character string.
assignment_date_col: Column in data; contains original dates treatments were assigned to observations. Only necessary when perfect_assignment = FALSE. Used to simulate imperfect information on the part of researchers conducting an adaptive trial. Must be of type Date, not a character string.

times

A numeric value of length 1, the number of simulations to conduct.

seeds

An integer vector of length(times) containing valid seeds to define random state for each trial.

control_augment

A numeric value ranging from 0 to 1; proportion of each wave guaranteed to receive the "Control" treatment. Default is 0. It is not recommended to use this in conjunction with random_assign_prop.

random_assign_prop

A numeric value ranging from 0 to 1; proportion of each wave to be assigned new treatments randomly, 1 - random_assign_prop is the proportion assigned through the bandit procedure. For example if this is set to 0.1, then for each wave 10% of the observations will be randomly assigned to a new treatment, while the remaining 90% will be assigned according to UCB1 or Thompson result. It is not recommended to use this in conjunction with control_augment. If batch sizes are small, and the number of rows is calculate to be less than 1, and probability sampling approach is used where each row in the batch will have a random_assign_prop probability of being selected for random assignment. Otherwise the number is rounded to a whole number, and that many rows are selected for random assignment.

ndraws

A numeric value; When Thompson sampling direct calculations fail, draws from a simulated posterior will be used to approximate the Thompson sampling probabilities. This is the number of simulations to use, the default is 5000 to match the default parameter bandit::best_binomial_bandit_sim(), but might need to be raised or lowered depending on performance and accuracy concerns.

control_condition

Value of the control condition. Only necessary when control_augment is greater than 0. Internally this value is coerced to a string, so it should be passed as a string, or a type that can easily be converted to a string.

time_unit

A character string specifying the unit of time for assigning periods when assignment_method is "date". Acceptable values are "day", "week", or "month". "month" does not require an additional column with the months of each observation, but it can accept a separate month_col. If month_col is specified, the periods follow the calendar months strictly, and when it is not specified months are simply used as the time interval. For example if a dataset has dates starting on July 26th, under month based assignment and a specified month_col the dates July 26th and August 3st would be in different periods, but if the month_col was not specified, they would be in the same period because the dates are less than one month apart.

period_length

A numeric value of length 1; represents the length of each treatment period. If assignment method is "date", this length refers the number of units specified in time_unit (i.e., if "day", 10 would be 10 days). If assignment method is "batch", this refers to the number of people in each batch.

block_cols

A character vector of variables to block by. This vector should not be named.

verbose

Logical; Toggles progress bar from furrr::future_map() and other intermediate messages.

check_args

Logical; Whether or not to robustly check whether arguments are valid. Default is TRUE, and recommended not to be changed.

keep_data

Logical; Whether or not to keep the final data from each trial. Recommended FALSE.

Details

Note that when called if data.table has not been attached already it will be when future.map() runs and a message may print. This does not mean that if you pass a tibble or data.frame, that data.table will used.

Implementation

This function simulates multiple adaptive Multi-Arm-Bandit Trials, using experimental data from a traditional randomized experiment. It follows the same core procedure as single_mab_simulation() (see details, there for a description), but conducts more than one simulation. This allows researchers to gauge the variance of the simulation procedure itself, and use that to form an empirical sampling distribution of the AIPW estimates, instead of relying around asymptotic normality [Hadad et al. (2021)] for inference.

The settings specified here have the same meaning as in single_mab_simulation(), outside of the additional parameters like times and seeds which define the number of multiple trials and random seeds to ensure reproducibility. An important note is that seeds can only take integer values, so they must be declared or coerced as valid integers, passing doubles (even ones that are mathematical integers) will result in an error. It is recommended to use sample.int(), with a known seed beforehand to generate the values. Additionally, it is highly recommended to set keep_data to FALSE as the memory used by the function will exponentially increase. This can cause significant performance issues, especially if your system must swap to disk because memory is full.

Parallel Processing

The function provides support for parallel processing via the future and furrr packages. When conducting a large number of simulations, parallelization can improve performance if sufficient system resources are available. Parallel processing must be explicitly set by the user, through future::plan(). Windows users should set the plan to "multisession", while Linux and MacOS users can use "multicore" or "multisession". Users running in a High Performance Computing environment (HPC), are encouraged to use future.batchtools, for their respective HPC scheduler. Note that parallel processing is not guaranteed to work on all systems, and may require additional setup or debugging effort from the user. For any issues, users are encouraged to consult the documentation of the above packages.

References

Bengtsson, Henrik. 2025. "Future: Unified Parallel and Distributed Processing in R for Everyone." https://cran.r-project.org/package=future.

Bengtsson, Henrik. 2025. "Future.Batchtools: A Future API for Parallel and Distributed Processing Using ‘Batchtools.’" https://cran.r-project.org/package=future.batchtools.

Hadad, Vitor, David A. Hirshberg, Ruohan Zhan, Stefan Wager, and Susan Athey. 2021. "Confidence Intervals for Policy Evaluation in Adaptive Experiments." Proceedings of the National Academy of Sciences of the United States of America 118 (15): e2014602118. tools:::Rd_expr_doi("10.1073/pnas.2014602118").

Kuleshov, Volodymyr, and Doina Precup. 2014. "Algorithms for Multi-Armed Bandit Problems." arXiv. tools:::Rd_expr_doi("10.48550/arXiv.1402.6028").

Loecher, Thomas Lotze and Markus. 2022. "Bandit: Functions for Simple a/B Split Test and Multi-Armed Bandit Analysis." https://cran.r-project.org/package=bandit.

Offer‐Westort, Molly, Alexander Coppock, and Donald P. Green. 2021. "Adaptive Experimental Design: Prospects and Applications in Political Science." American Journal of Political Science 65 (4): 826–44. tools:::Rd_expr_doi("10.1111/ajps.12597")..

Slivkins, Aleksandrs. 2024. "Introduction to Multi-Armed Bandits." arXiv. tools:::Rd_expr_doi("10.48550/arXiv.1904.07272").

Vaughan, Davis, Matt Dancho, and RStudio. 2022. "Furrr: Apply Mapping Functions in Parallel Using Futures." https://cran.r-project.org/package=furrr.

Examples

Run this code

# Multiple_mab_simulation() is a useful tool for running multiple trials
# using the same configuration settings, in different random states
data(tanf)
tanf <- tanf[1:50, ]

# The seeds passed must be integers, so it is highly recommended to create them
# before using `sample.int()`
seeds <- sample.int(10000, 5)

## Sequential Execution
x <- multiple_mab_simulation(
  data = tanf,
  assignment_method = "Batch",
  period_length = 25,
  whole_experiment = TRUE,
  blocking = FALSE,
  perfect_assignment = TRUE,
  algorithm = "Thompson",
  prior_periods = "All",
  control_augment = 0,
  data_cols = c(
    condition_col = "condition",
    id_col = "ic_case_id",
    success_col = "success"
  ),
  verbose = FALSE, times = 5, seeds = seeds, keep_data = FALSE
)
print(x)

## Parallel Execution using future:
## Check the future and furrr documentation for more details on possible options
if (requireNamespace("future", quietly = TRUE)) {
    # Set a Proper "plan"
    future::plan("multisession", workers = 2)
    multiple_mab_simulation(
      data = tanf,
      assignment_method = "Batch",
      period_length = 25,
      whole_experiment = TRUE,
      blocking = FALSE,
      perfect_assignment = TRUE,
      algorithm = "Thompson",
      prior_periods = "All",
      control_augment = 0,
      data_cols = c(
        condition_col = "condition",
        id_col = "ic_case_id",
        success_col = "success"
      ),
      verbose = FALSE, times = 5, seeds = seeds, keep_data = TRUE
    )
    # Always Set back to sequential to close processes
   future::plan("sequential")
}

Run the code above in your browser using DataLab