powered by
Dynamic programming of the optimal One-Armed Bernoulli Bandits process
dynOAB(N, al)
For dynOAB the matrix of maximal predicted rewards. For dynOAB2 the optimal predicted reward.
number of trials.
the known probability of reward on arm A.
Shelemyahu Zacks
simOAB
dynOAB(10, 0.5) dynOAB2(10, 0.5)
Run the code above in your browser using DataLab