The policy can afterwards be received using functions getPolicy
and getPolicyW
.
runPolicyIteAve(mdp, w, dur, maxIte = 100, getLog = TRUE)
The optimal gain (g) calculated.
The MDP loaded using loadMDP()
.
The label of the weight we optimize.
The label of the duration/time such that discount rates can be calculated.
Max number of iterations. If the model does not satisfy the unichain assumption the algorithm may loop.
Output the log messages.
getPolicy()
.