Learn R Programming

sarsop (version 0.6.16)

compute_policy: compute_policy

Description

Derive the corresponding policy function from the alpha vectors

Usage

compute_policy(
  alpha,
  transition,
  observation,
  reward,
  state_prior = rep(1, dim(observation)[[1]])/dim(observation)[[1]],
  a_0 = 1
)

Value

a data frame providing the optimal policy (choice of action) and corresponding value of the action for each possible belief state

Arguments

alpha

the matrix of alpha vectors returned by sarsop

transition

Transition matrix, dimension n_s x n_s x n_a

observation

Observation matrix, dimension n_s x n_z x n_a

reward

reward matrix, dimension n_s x n_a

state_prior

initial belief state, optional, defaults to uniform over states

a_0

previous action. Belief in state depends not only on observation, but on prior belief of the state and subsequent action that had been taken.

Examples

Run this code

m <- fisheries_matrices()
 ## Takes > 5s
if(assert_has_appl()){
alpha <- sarsop(m$transition, m$observation, m$reward, 0.95, precision = 10)
compute_policy(alpha, m$transition, m$observation, m$reward)
}


Run the code above in your browser using DataLab