Learn R Programming

pomdp (version 1.1.3)

regret: Calculate the Regret of a Policy

Description

Calculates the regret of a policy relative to a benchmark policy.

Usage

regret(policy, benchmark, belief = NULL)

Value

the regret as a difference of expected long-term rewards.

Arguments

policy

a solved POMDP containing the policy to calculate the regret for.

benchmark

a solved POMDP with the (optimal) policy. Regret is calculated relative to this policy.

belief

the used start belief. If NULL then the start belief of the benchmark is used.

Author

Michael Hahsler

Details

Calculates the regret defined as \(J^{\pi^*}(b_0) - J^{\pi}(b_0)\) with \(J^\pi\) representing the expected long-term reward given the policy \(\pi\) and the start belief \(b_0\). Note that for regret usually the optimal policy \(\pi^*\) is used as the benchmark. Since the optimal policy may not be known, regret relative to the best known policy can be used.

See Also

Other POMDP: POMDP_accessors, POMDP(), add_policy(), plot_belief_space(), projection(), sample_belief_space(), simulate_POMDP(), solve_POMDP(), solve_SARSOP(), transition_graph(), update_belief(), value_function(), write_POMDP()

Examples

Run this code
data(Tiger)

sol_optimal <- solve_POMDP(Tiger)
sol_optimal

# perform exact value iteration for 10 epochs
sol_quick <- solve_POMDP(Tiger, method = "enum", horizon = 10)
sol_quick

regret(sol_quick, sol_optimal)

Run the code above in your browser using DataLab