Skip to main content

Table 6 Example of policy improvement

From: Influence maximization in social media networks concerning dynamic user behaviors via reinforcement learning

State

Action

Initial policy

Updated policy

\(\varvec{s}\)

\(\varvec{a}\)

\(\pi (\varvec{s},\varvec{a})\)

\(\pi '(\varvec{s},\varvec{a})\)

\(\left( \,\begin{matrix} 0 &{} 0 &{} 0 &{} 0 \\ 0 &{} 0 &{} 0 &{} 0 \end{matrix}\,\right)\)

\(\left( \,\begin{matrix} 1 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 0 &{} 0 \end{matrix}\,\right)\)

0.0625

0.0463788

\(\left( \,\begin{matrix} 1 &{} 0 &{} 0 &{} 0 \\ 0 &{} 1 &{} 0 &{} 0 \end{matrix}\,\right)\)

0.0625

0.0515202

\(\left( \,\begin{matrix} 1 &{} 0 &{} 0 &{} 0 \\ 0 &{} 0 &{} 1 &{} 0 \end{matrix}\,\right)\)

0.0625

0.0554653

\(\vdots\)

\(\vdots\)

\(\vdots\)

\(\left( \,\begin{matrix} 0 &{} 0 &{} 0 &{} 1 \\ 0 &{} 0 &{} 0 &{} 1 \end{matrix}\,\right)\)

0.0625

0.0737093