get_bandit: Calculate Multi-Arm Bandit Decision Based on Algorithm

Description

Calculates the best treatment for a given period using either a UCB1 or Thompson sampling algorithm. Thompson sampling is done using bandit::best_binomial_bandit() from the bandit package and UCB1 values are calculated using the well-defined formula that can be found in Kuleshov and Precup (2014).

Usage

get_bandit(
  past_results,
  algorithm,
  conditions,
  current_period,
  control_augment = 0,
  ndraws
)

Value

A list of length 2 containing:

bandit: Bandit object, either a named numeric vector of Thompson sampling probabilities or a tibble/data.table of UCB1 values.
assignment_probabilities: Named numeric vector with a value for each condition containing the probability of being assigned that treatment.

Arguments

past_results: A tibble/data.table containing summary of prior periods, with successes, number of observations, and success rates, which is created by get_past_results().
algorithm: A character string specifying the MAB algorithm to use. Options are "thompson" or "ucb1". Algorithm defines the adaptive assignment process. Mathematical details on these algorithms can be found in Kuleshov and Precup 2014 and Slivkins 2024.
current_period: Numeric value of length 1; current period of the adaptive trial simulation.
control_augment: A numeric value ranging from 0 to 1; proportion of each wave guaranteed to receive the "Control" treatment. Default is 0. It is not recommended to use this in conjunction with random_assign_prop.
ndraws: A numeric value; When Thompson sampling direct calculations fail, draws from a simulated posterior will be used to approximate the Thompson sampling probabilities. This is the number of simulations to use, the default is 5000 to match the default parameter bandit::best_binomial_bandit_sim(), but might need to be raised or lowered depending on performance and accuracy concerns.

Details

The Thompson assignment_probabilities are the same as the bandit vector except when control_augment or random_assign_prop are greater than 0, as these arguments will alter the probabilities of assignment.

Thompson sampling is calculated using the bandit package but the direct calculation can result in errors or overflow. If this occurs, a simulation based method from the same package is used instead to estimate the posterior distribution. If this occurs a warning will be presented. ndraws specifies the number of iterations for the simulation based method, and the default value is 5000.

The UCB1 algorithm only selects 1 treatment at each period, with no probability matching so assignment_probabilities will always have 1 element equal to 1, and the rest equal to 0, unless control_augment or random_assign_prop are greater than 0, which will alter the probabilities of assignment. For example, if the original vector is (0, 0, 1), and control_augment = 0.2, the new vector is (0.2, 0, 0.8) assuming the first element is control. If instead the 3rd element were the control group the resulting vector would not be changed because it already meets the control group threshold.

References

Kuleshov, Volodymyr, and Doina Precup. 2014. "Algorithms for Multi-Armed Bandit Problems." arXiv. tools:::Rd_expr_doi("10.48550/arXiv.1402.6028").

Loecher, Thomas Lotze and Markus. 2022. "Bandit: Functions for Simple a/B Split Test and Multi-Armed Bandit Analysis." https://cran.r-project.org/package=bandit.