Learn R Programming

pomdp (version 1.0.2)

policy_graph: POMDP Policy Graphs

Description

The function creates and plots the POMDP policy graph in a converged POMDP solution and the policy tree for a finite-horizon solution. uses plot in igraph with appropriate plotting options.

Usage

policy_graph(x, belief = NULL, show_belief = TRUE, col = NULL, ...)

plot_policy_graph( x, belief = NULL, show_belief = TRUE, legend = TRUE, engine = c("igraph", "visNetwork"), col = NULL, ... )

estimate_belief_for_nodes(x, epoch = 1, ...)

Arguments

x

object of class POMDP containing a solved and converged POMDP problem.

belief

the initial belief is used to mark the initial belief state in the grave of a converged solution and to identify the root node in a policy graph for a finite-horizon solution. If NULL then the belief is taken from the model definition.

show_belief

logical; estimate belief proportions? If TRUE then estimate_belief_for_nodes() is used and the belief is visualized as a pie chart in each node.

col

colors used for the states.

...

parameters are passed on to policy_graph(), estimate_belief_for_nodes() and the functions they use. Also, plotting options are passed on to the plotting engine igraph::plot.igraph() or visNetwork::visIgraph().

legend

logical; display a legend for colors used belief proportions?

engine

The plotting engine to be used. For "visNetwork", flip.y = FALSE can be used to show the root node on top.

epoch

estimate the belief for nodes in this epoch. Use 1 for converged policies.

Value

  • policy_graph() returns the policy graph as an igraph object.

  • plot_policy_graph() returns invisibly what the plotting engine returns.

  • estimate_belief_for_nodes() returns a matrix with the central belief for each node.

Details

Each policy graph node represent a segment (or part of a hyperplane) of the value function. Each node represents one or more believe states. If available, a pie chart (or the color) in each node represent the central belief of the belief states belonging to the node (i.e., the center of the hyperplane segment). This can help with interpreting the policy graph.

For converged POMDP solution a graph is produced, for finite-horizon solution a policy tree is produced. The levels of the tree and the first number in the node label represent the epochs. Many algorithms produce unused policy graph nodes which are filtered to produce a clean tree structure. Non-converged policies depend on the initial belief and if an initial belief is specified, then different nodes will be filtered and the tree will look different.

First, the policy in the solved POMDP is converted into an igraph object using policy_graph(). Average beliefs for the graph nodes are estimated using estimate_belief_for_node() and then the igraph object is visualized using the plotting function igraph::plot.igraph() or, for interactive graphs, visNetwork::visIgraph().

estimate_belief_for_nodes() estimated the central belief for each node/segment of the value function by generating/sampling a large set of possible belief points, assigning them to the segments and then averaging the belief over the points assigned to each segment. Additional parameters like method and the sample size n are passed on to sample_belief_space(). If no belief point is generated for a segment, then a warning is produced. In this case, the number of sampled points can be increased.

See Also

Other policy: optimal_action(), plot_value_function(), policy(), reward(), solve_POMDP(), solve_SARSOP()

Examples

Run this code
# NOT RUN {
data("Tiger")

## policy graphs for converged solutions
sol <- solve_POMDP(model = Tiger)
sol

policy_graph(sol)

## visualization
plot_policy_graph(sol)

## use a different graph layout (circle and manual; needs igraph)
library("igraph")
plot_policy_graph(sol, layout = layout.circle)
plot_policy_graph(sol, layout = rbind(c(1,1), c(1,-1), c(0,0), c(-1,-1), c(-1,1)))

## hide labels and legend
plot_policy_graph(sol, edge.label = NA, vertex.label = NA, legend = FALSE)

## add a plot title
plot_policy_graph(sol, main = sol$name)

## custom larger vertex labels (A, B, ...)
plot_policy_graph(sol,
  vertex.label = LETTERS[1:nrow(policy(sol)[[1]])],
  vertex.label.cex = 2,
  vertex.label.color = "white")

## plotting the igraph object directly
## (e.g., using the graph in the layout and to change the edge curvature)
pg <- policy_graph(sol)
plot(pg,
  layout = layout_as_tree(pg, root = 3, mode = "out"),
  edge.curved = curve_multiple(pg, .2))

## changes labels
plot(pg,
  edge.label = abbreviate(E(pg)$label),
  vertex.label = V(pg)$label,
  vertex.size = 20)

## plot interactive graphs using the visNetwork library.
## Note: the pie chart representation is not available, but colors are used instead.
plot_policy_graph(sol, engine = "visNetwork")

## add smooth edges and a layout (note, engine can be abbreviated)
plot_policy_graph(sol, engine = "visNetwork", layout = "layout_in_circle", smooth = TRUE)

## estimate the central belief for the graph nodes. We use the default random sampling method with 
## a sample size of n = 100. 
estimate_belief_for_nodes(sol, n = 100)

## policy trees for finite-horizon solutions
sol <- solve_POMDP(model = Tiger, horizon = 4, method = "incprune")

policy_graph(sol)

plot_policy_graph(sol)
# Note: the first number in the node id is the epoch.

# plot the policy tree for an initial belief of 90% that the tiger is to the left
plot_policy_graph(sol, belief = c(0.9, 0.1))

# }

Run the code above in your browser using DataLab