Experimental data from any Multi-Armed Bandit (MAB)-like task.
data [data.frame]
| subid | block | trial | object_1 | object_2 | object_3 | object_4 | reward_1 | reward_2 | reward_3 | reward_4 | action |
| 1 | 1 | 1 | A | B | C | D | 20 | 0 | 60 | 40 | A |
| 1 | 1 | 2 | A | B | C | D | 20 | 40 | 60 | 80 | B |
| 1 | 1 | 3 | A | B | C | D | 20 | 0 | 60 | 40 | C |
| 1 | 1 | 4 | A | B | C | D | 20 | 40 | 60 | 80 | D |
| .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. |
Each row must contain all information relevant to that trial for running a decision-making task (e.g., multi-armed bandit) as well as the feedback received.
In this type of paradigm, the rewards associated with possible actions must be explicitly written in the table for every trial (aka, tabular case, see Sutton & Barto, 2018, Chapter 2).
Sutton, R. S., & Barto, A. G. (2018). Reinforcement Learning: An Introduction (2nd ed). MIT press.