Influence maximization in social media networks concerning dynamic user behaviors via reinforcement learning

Computational Social Networks

Table 6 Example of policy improvement

State	Action	Initial policy	Updated policy
\(\varvec{s}\)	\(\varvec{a}\)	\(\pi (\varvec{s},\varvec{a})\)	\(\pi '(\varvec{s},\varvec{a})\)
\(\left( \,\begin{matrix} 0 &{} 0 &{} 0 &{} 0 \\ 0 &{} 0 &{} 0 &{} 0 \end{matrix}\,\right)\)	\(\left( \,\begin{matrix} 1 &{} 0 &{} 0 &{} 0 \\ 1 &{} 0 &{} 0 &{} 0 \end{matrix}\,\right)\)	0.0625	0.0463788
	\(\left( \,\begin{matrix} 1 &{} 0 &{} 0 &{} 0 \\ 0 &{} 1 &{} 0 &{} 0 \end{matrix}\,\right)\)	0.0625	0.0515202
	\(\left( \,\begin{matrix} 1 &{} 0 &{} 0 &{} 0 \\ 0 &{} 0 &{} 1 &{} 0 \end{matrix}\,\right)\)	0.0625	0.0554653
	\(\vdots\)	\(\vdots\)	\(\vdots\)
	\(\left( \,\begin{matrix} 0 &{} 0 &{} 0 &{} 1 \\ 0 &{} 0 &{} 0 &{} 1 \end{matrix}\,\right)\)	0.0625	0.0737093