State | Action | Initial policy | Updated policy |
---|---|---|---|
\(\varvec{s}\) | \(\varvec{a}\) | \(\pi (\varvec{s},\varvec{a})\) | \(\pi '(\varvec{s},\varvec{a})\) |
\(\left( \,\begin{matrix} 0 &{} 0 &{} 0 &{} 0 \\ 0 &{} 0 &{} 0 &{} 0 \end{matrix}\,\right)\) | \(\left( \,\begin{matrix} 1 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 0 &{} 0 \end{matrix}\,\right)\) | 0.0625 | 0.0463788 |
\(\left( \,\begin{matrix} 1 &{} 0 &{} 0 &{} 0 \\ 0 &{} 1 &{} 0 &{} 0 \end{matrix}\,\right)\) | 0.0625 | 0.0515202 | |
\(\left( \,\begin{matrix} 1 &{} 0 &{} 0 &{} 0 \\ 0 &{} 0 &{} 1 &{} 0 \end{matrix}\,\right)\) | 0.0625 | 0.0554653 | |
\(\vdots\) | \(\vdots\) | \(\vdots\) | |
\(\left( \,\begin{matrix} 0 &{} 0 &{} 0 &{} 1 \\ 0 &{} 0 &{} 0 &{} 1 \end{matrix}\,\right)\) | 0.0625 | 0.0737093 |