Calculates the best treatment for a given period using either a UCB1 or Thompson sampling algorithm.
Thompson sampling is done using bandit::best_binomial_bandit() from
the bandit
package and UCB1 values are calculated using the well-defined formula that can be found
in Kuleshov and Precup (2014).
get_bandit(
past_results,
algorithm,
conditions,
current_period,
control_augment = 0,
ndraws
)A list of length 2 containing:
bandit: Bandit object, either a named numeric vector of Thompson sampling probabilities or a
tibble/data.table of UCB1 values.
assignment_probabilities: Named numeric vector with a value for each condition
containing the probability of being assigned that treatment.
A tibble/data.table containing summary of prior periods, with
successes, number of observations, and success rates, which is created by get_past_results().
A character string specifying the MAB algorithm to use. Options are "thompson" or "ucb1". Algorithm defines the adaptive assignment process. Mathematical details on these algorithms can be found in Kuleshov and Precup 2014 and Slivkins 2024.
Numeric value of length 1; current period of the adaptive trial simulation.
A numeric value ranging from 0 to 1; proportion of each wave guaranteed to receive the "Control" treatment.
Default is 0. It is not recommended to use this in conjunction with random_assign_prop.
A numeric value; When Thompson sampling direct calculations fail, draws from a simulated posterior
will be used to approximate the Thompson sampling probabilities. This is the number of simulations to use, the default
is 5000 to match the default parameter bandit::best_binomial_bandit_sim(), but might need to be raised or lowered depending on performance and accuracy
concerns.
The Thompson assignment_probabilities are the same as the bandit vector except when
control_augment or random_assign_prop are greater than 0, as these arguments will alter the probabilities
of assignment.
Thompson sampling is calculated using the bandit
package but the direct calculation can result in errors or overflow. If this occurs, a simulation based method
from the same package is used instead to estimate the posterior distribution.
If this occurs a warning will be presented. ndraws specifies the number of iterations for the
simulation based method, and the default value is 5000.
The UCB1 algorithm only selects 1 treatment at each period, with no probability matching
so assignment_probabilities will always have 1 element equal to 1, and the rest equal to 0, unless
control_augment or random_assign_prop are greater than 0, which will alter the probabilities of assignment.
For example, if the original vector is (0, 0, 1), and control_augment = 0.2,
the new vector is (0.2, 0, 0.8) assuming the first element is control. If instead the 3rd element
were the control group the resulting vector would not be changed because it already meets the
control group threshold.
Kuleshov, Volodymyr, and Doina Precup. 2014. "Algorithms for Multi-Armed Bandit Problems." arXiv. tools:::Rd_expr_doi("10.48550/arXiv.1402.6028").
Loecher, Thomas Lotze and Markus. 2022. "Bandit: Functions for Simple a/B Split Test and Multi-Armed Bandit Analysis." https://cran.r-project.org/package=bandit.