RPM_Binomial_Bandit computes randomized probability matching probabilities for each arm being best in a multi-armed bandit. Close cousin to Thomson Sampling.
RPM_Binomial_Bandit(
Success,
Trials,
Alpha = 1L,
Beta = 1L,
SubDivisions = 1000L
)
Vector of successes. One slot per arm.
Vector of trials. One slot per arm.
Prior parameter for success
Prior parameter for trials
Default is 100L in the stats package. Changed it to 1000 for this function.
Probability of each arm being the best arm compared to all other arms.
Other Reinforcement Learning:
RL_Initialize()
,
RL_ML_Update()
,
RL_Update()