valueIte {MDP} | R Documentation |
Perform value iteration on the MDP.
valueIte(mdp, iW, iDur, rate=0.1, rateBase=365, times=10, eps=1e-05, termValues)
mdp |
The MDP loaded using loadMDP. |
iW |
Index of the weight we optimize. |
iDur |
Index of duration/time such that discount rates can be calculated. |
rate |
Interest rate. |
rateBase |
The time-horizon the rate is valid over. |
times |
The max number of times value iteration is performed. |
eps |
Stopping criterion. If max(w(t)-w(t+1))<epsilon then stop the algorithm, i.e the policy becomes epsilon optimal (see [1] p161). |
termValues |
The terminal values used (values of the last states in the MDP. |
If the MDP has a finite time-horizon then arguments times
and eps
are ignored.
NULL (invisible)
Lars Relund lars@relund.dk
[1] Puterman, M.; Markov Decision Processes, Wiley-Interscience, 1994.