POMDP: Define a POMDP Problem

Description

Defines all the elements of a POMDP problem including the discount rate, the set of states, the set of actions, the set of observations, the transition probabilities, the observation probabilities, and rewards.

Usage

POMDP(discount, states, actions, observations, transition_prob, 
      observation_prob, reward, start = "uniform", max = TRUE, name = NA)
R_(action, start.state, end.state, observation, value)
O_(action, end.state, observation, probability)
T_(action, start.state, end.state, probability)

Arguments

discount

numeric; discount rate between 0 and 1.

states

a character vector specifying the names of the states.

actions

a character vector specifying the names of the available actions.

observations

a character vector specifying the names of the observations.

transition_prob

Specifies the transition probabilities between states. Options are:

a data frame with 4 columns, where the columns specify action, start-state, end-state and the probability respectively. The first 3 columns could be either character (the name of the action or state) or integer indices.
a named list of \(m\) (number of actions) matrices. Each matrix is square of size \(n \times n\), where \(n\) is the number of states. The name of each matrix the action it applies to. Instead of a matrix, also the strings "identity" or "uniform" can be specified.

observation_prob

Specifies the probability that a state produces an observation. Options are:

a data frame with 4 columns, where the columns specify action, end-state, observation and the probability, respectively. The first 3 columns could be either character (the name of the action, state, or observation), integer indices, or they can be "*" to indicate that the observation probability applies to all actions or states. Use rbind() with helper function O_() to create this data frame.
a named list of \(m\) matrices, where \(m\) is the number of actions. Each matrix is of size \(n \times o\), where \(n\) is the number of states and \(o\) is the number of observations. The name of each matrix is the action it applies to. Instead of a matrix, also the strings "identity" or "uniform" can be specified.

reward

Specifies the rewards dependent on action, states and observations. Options are:

a data frame with 5 columns, where the columns specify action, start.state, end.state, observation and the reward, respectively. The first 4 columns could be either character (names of the action, states, or observation), integer indices, or they can be "*" to indicate that the reward applies to all transitions. Use rbind() with helper function R_() to create this data frame.
a named list of \(m\) lists, where \(m\) is the number of actions (names should be the actions). Each list contains \(n\) named matrices where each matrix is of size \(n \times o\), in which \(n\) is the number of states and \(o\) is the number of observations. Names of these matrices should be the name of states.

start

Specifies the initial probabilities for each state (i.e., the initial belief state) used to find the initial node in the policy graph and to calculate the total expected reward. The default initial belief state is a uniform distribution over all states. No initial belief state can be used by setting start = NULL. Options to specift start are:

a probability distribution over the \(n\) states. That is, a vector of \(n\) probabilities, that add up to \(1\).
the string "uniform" for a uniform distribution over all states.
an integer in the range \(1\) to \(n\) to specify a single starting state. or
a string specifying the name of a single starting state.
a vector of strings, specifying a subset of states with a uniform start distribution. If the first element of the vector is "-", then the following subset of states is excluded from the set of start states.

max

logical; is this a maximization problem (maximize reward) or a minimization (minimize cost specified in reward)?

name

a string to identify the POMDP problem.

action, start.state, end.state, observation, probability, value

Values used in the helper functions O_(), R_(), and T_() to create an entry for observation_prob, reward, or transistion_prob above, respectively.

Value

The function returns an object of class POMDP which is list with an element called model containing a list with the model specification. solve_POMDP reads the object and adds a list element called solution.

Details

POMDP problems can be solved using solve_POMDP. Details about the available specifications can be found in [1].

References

[1] For further details on how the POMDP solver utilized in this R package works check the following website: http://www.pomdp.org

Examples

Run this code

# NOT RUN {
## The Tiger Problem

TigerProblem <- POMDP(
  name = "Tiger Problem",
  
  discount = 0.75,
  
  states = c("tiger-left" , "tiger-right"),
  actions = c("listen", "open-left", "open-right"),
  observations = c("tiger-left", "tiger-right"),
  
  start = "uniform",
  
  transition_prob = list(
    "listen" =     "identity", 
    "open-left" =  "uniform", 
    "open-right" = "uniform"),

  observation_prob = list(
    "listen" = rbind(c(0.85, 0.15), 
                     c(0.15, 0.85)),
    "open-left" =  "uniform",
    "open-right" = "uniform"),
  
  # the rew helper expects: action, start.state, end.state, observation, value
  reward = rbind(
    R_("listen",     "*",           "*", "*", -1  ),
    R_("open-left",  "tiger-left",  "*", "*", -100),
    R_("open-left",  "tiger-right", "*", "*", 10  ),
    R_("open-right", "tiger-left",  "*", "*", 10  ),
    R_("open-right", "tiger-right", "*", "*", -100)
  )
)
  
TigerProblem

model(TigerProblem)
# }

Run the code above in your browser using DataLab