Implements \(\varepsilon\)-greedy action selection. In this strategy, the agent explores the environment by selecting an action at random with probability \(\varepsilon\). Alternatively, the agent exploits its current knowledge by choosing the optimal action with probability \(1-\varepsilon\).
selectEpsilonGreedyAction(Q, state, epsilon)
State-action table of type hash
.
The current state.
Exploration rate between 0 and 1.
Character value defining the next action.
Sutton and Barto (1998). "Reinforcement Learning: An Introduction", MIT Press, Cambridge, MA.