compute_policy

Derive the corresponding policy function from the alpha vectors

A toolkit for Partially Observed Markov Decision Processes (POMDP). Provides
bindings to C++ libraries implementing the algorithm SARSOP (Successive Approximations
of the Reachable Space under Optimal Policies) and described in Kurniawati et al (2008),
<doi:10.15607/RSS.2008.IV.009>. This package also provides a high-level interface
for generating, solving and simulating POMDP problems and their solutions.

Carl Boettiger

sarsop

Approximate POMDP Planning Software

Jeroen Ooms

Milad Memarzadeh

Hanna Kurniawati

David Hsu

Wee Sun Lee

Yanzhu Du

Xan Huang

Trey Smith

Tony Cassandra

Lee Thomason

Carl Kindman

Le Trong Dao

Amit Jain

Rong Nan

Ulrich Drepper

 Free Software Foundation

Tyge Lovset

Yves Berquin

Benjamin Grüdelbach

 RSA Data Security, Inc.

compute_policy function

<dl><dt>alpha</dt>
<dd>the matrix of alpha vectors returned by <code>sarsop</code></dd>
<dt>transition</dt>
<dd>Transition matrix, dimension n_s x n_s x n_a</dd>
<dt>observation</dt>
<dd>Observation matrix, dimension n_s x n_z x n_a</dd>
<dt>reward</dt>
<dd>reward matrix, dimension n_s x n_a</dd>
<dt>state_prior</dt>
<dd>initial belief state, optional, defaults to uniform
over states</dd>
<dt>a_0</dt>
<dd>previous action. Belief in state depends not only on observation, but on prior belief of the state and subsequent action that had been taken.</dd></dl>

Arguments

compute_policy — compute_policy

<dl>

<dt>alpha</dt>
<dd>the matrix of alpha vectors returned by <code>sarsop</code></dd>


<dt>transition</dt>
<dd>Transition matrix, dimension n_s x n_s x n_a</dd>


<dt>observation</dt>
<dd>Observation matrix, dimension n_s x n_z x n_a</dd>


<dt>reward</dt>
<dd>reward matrix, dimension n_s x n_a</dd>


<dt>state_prior</dt>
<dd>initial belief state, optional, defaults to uniform
over states</dd>


<dt>a_0</dt>
<dd>previous action. Belief in state depends not only on observation, but on prior belief of the state and subsequent action that had been taken.</dd>

</dl>

compute_policy: compute_policy

Description

Usage

Value

Arguments

Examples