mdp_example_forest(S, r1, r2, p)
Here is the modelisation of this problem. Let 1, ... S be the states of the forest. the Sth state being the oldest. Let Wait be action 1 and Cut action 2. After a fire, the forest is in the youngest state, that is state 1.
The transition matrix P of the problem can then be defined as follows:
$$ P(,,1) = \left[ \begin{array}{llllll} p & 1-p & 0 & \ldots & \ldots & 0 \\ p & 1-p & 0 & \ldots & \ldots & 0 \\ \vdots & \vdots & \vdots & \ddots & \ddots & \vdots \\ \vdots & \vdots & \vdots & \ddots & \ddots & 0 \\ \vdots & \vdots & \vdots & \ddots & \ddots & 1-p \\ p & 0 & 0 & \ldots & 0 & 1-p \\ \end{array} \right] $$
$$ P(,,2) = \left[ \begin{array}{lllll} 1 & 0 & \ldots & \ldots & 0 \\ \vdots & \vdots & \ddots & \ddots & \vdots \\ \vdots & \vdots & \ddots & \ddots & \vdots \\ 1 & 0 & \ldots & \ldots & 0 \\ \end{array} \right] $$
The reward matrix R is defined as follows:
$$ R(,1) = \left[ \begin{array}{l} 0 \\ \vdots \\ \vdots \\ \vdots \\ 0 \\ r1 \\ \end{array} \right] $$
$$ R(,2) = \left[ \begin{array}{l} 0 \\ 1 \\ \vdots \\ \vdots \\ 1 \\ r2 \\ \end{array} \right] $$