MDP: Define an MDP Problem

Description

Defines all the elements of a MDP problem and formulates them as a POMDP where all states are observable.

Usage

MDP(
  states,
  actions,
  transition_prob,
  reward,
  discount = 0.9,
  horizon = Inf,
  terminal_values = 0,
  start = "uniform",
  max = TRUE,
  name = NA
)

Arguments

states

a character vector specifying the names of the states.

actions

a character vector specifying the names of the available actions.

transition_prob

Specifies the transition probabilities between states.

reward

Specifies the rewards dependent on action, states and observations.

discount

numeric; discount rate between 0 and 1.

horizon

numeric; Number of epochs. Inf specifies an infinite horizon.

terminal_values

a vector with the terminal values for each state.

start

Specifies in which state the MDP starts.

max

logical; is this a maximization problem (maximize reward) or a minimization (minimize cost specified in reward)?

name

a string to identify the MDP problem.

Value

The function returns an object of class POMDP which is list with an element called model containing a list with the model specification. solve_POMDP reads the object and adds a list element called solution.

Details

The MDP is formulated as a POMDP where all states are completely observable. This is achieved by defining one observation per state with identity observation probability matrices.

More details on specifying the parameters can be found in the documentation for POMDP.

Examples

Run this code

# NOT RUN {
## Michael's Sleepy Tiger Problem is a MDP with perfect observability

Tiger_MDP <- MDP(
  name = "Michael's Sleepy Tiger Problem",
  discount = 1,
  
  states = c("tiger-left" , "tiger-right"),
  actions = c("open-left", "open-right", "do-nothing"),
  start = "tiger-left",
  
  transition_prob = list(
    "open-left" =  "uniform", 
    "open-right" = "uniform",
    "do-nothing" = "identity"),

  # the rew helper expects: action, start.state, end.state, observation, value
  reward = rbind(
    R_("open-left",  "tiger-left",  v = -100),
    R_("open-left",  "tiger-right", v =   10),
    R_("open-right", "tiger-left",  v =   10),
    R_("open-right", "tiger-right", v = -100),
    R_("do-nothing",                v =    0)
  )
)
  
Tiger_MDP

# do 5 epochs with no discounting
s <- solve_POMDP(Tiger_MDP, method = "enum", horizon = 5)
s

# value function and policy
plot_value_function(s)
policy(s)

# }