Learn R Programming

pomdp (version 1.0.2)

simulate_POMDP: Simulate Trajectories in a POMDP

Description

Simulate trajectories through a POMDP. The start state for each trajectory is randomly chosen using the specified belief. The belief is used to choose actions from the the epsilon-greedy policy and then updated using observations.

Usage

simulate_POMDP(
  model,
  n = 100,
  belief = NULL,
  horizon = NULL,
  visited_beliefs = FALSE,
  epsilon = NULL,
  digits = 7,
  verbose = FALSE
)

Arguments

model

a POMDP model.

n

number of trajectories.

belief

probability distribution over the states for choosing the starting states for the trajectories. Defaults to the start belief state specified in the model or "uniform".

horizon

number of epochs for the simulation. If NULL then the horizon for the model is used.

visited_beliefs

logical; Should all belief points visited on the trajectories be returned? If FALSE then only the belief at the final epoch is returned.

epsilon

the probability of random actions for using an epsilon-greedy policy. Default for solved models is 0 and for unsolved model 1.

digits

round belief points.

verbose

report used parameters.

Value

A matrix with belief points (in the final epoch or all) as rows. Attributes containing action counts, and rewards for each trajectory may be available.

See Also

Other POMDP: POMDP(), plot_belief_space(), sample_belief_space(), solve_POMDP(), solve_SARSOP(), transition_matrix(), update_belief(), write_POMDP()

Examples

Run this code
# NOT RUN {
data(Tiger)

# solve the POMDP for 5 epochs and no discounting
sol <- solve_POMDP(Tiger, horizon = 5, discount = 1, method = "enum")
sol
policy(sol)

## Example 1: simulate 10 trajectories, only the final belief state is returned
sim <- simulate_POMDP(sol, n = 100, verbose = TRUE)
head(sim)

# plot the final belief state, look at the average reward and how often different actions were used.
plot_belief_space(sol, sample = sim)

# additional data is available as attributes
names(attributes(sim))
attr(sim, "avg_reward")
colMeans(attr(sim, "action"))


## Example 2: look at all belief states in the trajectory starting with an initial start belief.
sim <- simulate_POMDP(sol, n = 100, belief = c(.5, .5), visited_beliefs = TRUE)

# plot with added density
plot_belief_space(sol, sample = sim, ylim = c(0,5), jitter = 1)
lines(density(sim[, 1], bw = .02)); axis(2); title(ylab = "Density")


## Example 3: simulate trajectories for an unsolved POMDP which uses a epsilon of 1 
# (i.e., all randomized actions)
sim <- simulate_POMDP(Tiger, n = 100, horizon = 5, visited_beliefs = TRUE)
plot_belief_space(sol, sample = sim, ylim = c(0,6))
lines(density(sim[, 1], bw = .05)); axis(2); title(ylab = "Density")
# }

Run the code above in your browser using DataLab