simulate_POMDP

model

probability distribution over the states for choosing the
starting states for the trajectories.
Defaults to the start belief state specified in
the model or "uniform".

belief

number of epochs for the simulation. If <code>NULL</code> then the
horizon for the model is used.

horizon

logical; Should all belief points visited on the
trajectories be returned? If <code>FALSE</code> then only the belief at the final
epoch is returned.

visited_beliefs

the probability of random actions for using an epsilon-greedy policy.
Default for solved models is 0 and for unsolved model 1.

epsilon

digits

verbose

Simulate trajectories through a POMDP. The start state for each
trajectory is randomly chosen using the specified belief. The belief is used to choose actions
from the the epsilon-greedy policy and then updated using observations.

Provides the infrastructure to define and analyze the solutions of Partially Observable Markov Decision Process (POMDP) models. Interfaces for various exact and approximate solution algorithms are available including value iteration, point-based value iteration and SARSOP. Smallwood and Sondik (1973) <doi:10.1287/opre.21.5.1071>.

simulate_POMDP: Simulate Trajectories in a POMDP

Description

Usage

Arguments

Value

See Also

Examples