simulate_MDP

model

probability distribution over the states for choosing the
starting states for the trajectories.
Defaults to "uniform".

start

number of epochs for the simulation. If <code>NULL</code> then the
horizon for the model is used.

horizon

logical; Should all visited states on the
trajectories be returned? If <code>FALSE</code> then only the final
state is returned.

visited_states

the probability of random actions for using an epsilon-greedy policy.
Default for solved models is 0 and for unsolved model 1.

epsilon

verbose

Simulate trajectories through a MDP. The start state for each
trajectory is randomly chosen using the specified belief. The belief is used to choose actions
from an epsilon-greedy policy and then update the state.

Provides the infrastructure to define and analyze the solutions of Partially Observable Markov Decision Process (POMDP) models. Interfaces for various exact and approximate solution algorithms are available including value iteration, point-based value iteration and SARSOP. Smallwood and Sondik (1973) <doi:10.1287/opre.21.5.1071>.

simulate_MDP: Simulate Trajectories in a MDP

Description

Usage

Arguments

Value

See Also

Examples