The RPO is defined as the difference between
the weight of the state when using action iA
and the maximum
weight of the node when using another predecessor different from iA
.
getRPO(
mdp,
w,
iA,
sId = ifelse(mdp$timeHorizon >= Inf, mdp$founderStatesLast + 1,
1):ifelse(mdp$timeHorizon >= Inf, mdp$states + mdp$founderStatesLast, mdp$states) - 1,
criterion = "expected",
dur = "",
rate = 0,
rateBase = 1,
discountFactor = NULL,
g = 0,
discountMethod = "continuous",
stateStr = TRUE
)
The RPO (matrix/data frame).
The MDP loaded using loadMDP()
.
The label of the weight/reward we calculate RPO for.
The action index we calculate the RPO with respect to (same size as sId
).
Vector of id's of the states we want to retrieve.
The criterion used. If expected
used expected reward, if discount
used discounted rewards, if average
use average rewards.
The label of the duration/time such that discount rates can be calculated.
The interest rate.
The time-horizon the rate is valid over.
The discount rate for one time unit. If specified rate
and rateBase
are not used to calculate the discount rate.
The optimal gain (g) calculated (used if criterion = "average"
).
Either 'continuous' or 'discrete', corresponding to discount factor exp(-rate/rateBase)
or 1/(1 + rate/rateBase)
, respectively. Only used if discountFactor
is NULL
.
Output the state string.