To investigate the ability to control a system using the proposed distribution-based control approach, we formulate a problem using the real-valued voter model within the NCP framework described by [3]. The Network Control Problem definition requires the following components to be defined: a network, a diffusion model, a control system, and an objective function. The following subsections define each of the required components of the network control problem, including the objective function which uses a target distribution and the Hellinger distance measure to determine whether the system is still in an acceptable state.

### Network

The network is represented by a graph* G* = (*V*,* E*), where *V* is the set of nodes within the system and the set *E* represents the edges connecting nodes. The control problem here is evaluated across three different theoretical network types. For each network type, 10 randomly generated networks of 100 agents each were considered. In all networks, each agent also included a link to itself. In addition to this, it is also ensured that each network consists of a single connected component. A description of each network type, as well as the parameters used in the generative models are described below. In the case of the random and small world networks, parameter values were selected to produce an average degree similar to those found in the scale free networks.

#### Random network

Each possible link between a pair of nodes, *i* and *j*, is included within the network with a probability* p* = 0.031 to produce an Erdős–Rényi random graph.

#### Scale free network

Links are formed between nodes based on the preferential attachment model described by [11].

#### Small world network

The small world networks were generated using the model of [12], with an average degree of 4 and a* β* value of 0.25.

### Diffusion model

The specification of the network control problem from [3] defines the diffusion model with two parts: the sharing strategy and the learning strategy. In this work, we use a real-valued voter model to form the diffusion model. The voter model is commonly used to model the change of opinion within a group of networked individuals and has been investigated in other network control research (i.e. [9]). Within this work, we consider a real-valued voter model in which each node’s state is represented by a single value, bounded between − 1.0 and 1.0. Left uncontrolled, the voter model converges toward a single value over time. The equations representing the two strategies which define the real-valued voter model are included in Eqs. 4, 5 and 6. The sharing strategy (Sh(*v*)) of a node *v* within this model has each agent send its current state value (*s*(*v*, *t*)) to all of its neighbours, including itself, at each time step. Each shared piece of data a node *v* receives at a time step is stored in that node’s information set *I*. The learning strategy (*L*(*v*, *I*)) for this model requires that, at each time step, every agent *v* move its state by an amount, *step* (0.01 is used as a constant here), toward one of its randomly selected neighbours’ shared state values from the previous time step, as determined by Eq. 6.

$$\begin{aligned} \text{Sh}(v) = s(v,t) \end{aligned}$$

(4)

$$\begin{aligned} L(v,I) = s(v,t) + (\text{step} \times \text{sign}) \end{aligned}$$

(5)

$$\begin{aligned} \text{sign} {=} \left\{ \begin{matrix} -1\quad \text{ with } \text{ probability } \frac{|\{s(u,t)<s(v,t)|s(u,t)\in I(v,t)\}|}{|I(v,t)|} \\ \phantom {-}1\quad \text { with probability} \frac{|\{s(u,t)>s(v,t)|s(u,t)\in I(v,t)\}|}{|I(v,t)|} \\ 0\quad \text { otherwise} \end{matrix}\right. \end{aligned}$$

(6)

### Control system

The configuration of the control system specifies the set of nodes that the controller can set the state of to affect the overall network state. The results presented here consider many different possible configurations across the modelled networks. One of the main parameters of the configuration that is varied is the number of controllers, where we use either 3, 5 or 10 control nodes within the network. The set of controllers is determined using the FAR heuristic, as described by [9] and outlined in Algorithm 1. Starting from a seed node that is either included as input or selected randomly, this heuristic iteratively selects the next node such that it is the one with the largest shortest path to the current controller set. This has the effect of distributing the control nodes within the network in a way that maximizes the ‘farness’ between them.

The controller behaviour is learned using a reinforcement learning approach, as described in "Learning a control policy". As explained further in "Learning a control policy" to allow for more efficient execution of the learning and simulation process, a single signal (state value) is injected to all control nodes at each time step. By forcing the same signal to be used as input to each controller, the action space of the problem is made constant instead of growing exponentially relative to the number of controllers used. The value of the inserted signal is selected from a list consisting of values between − 0.5 and 0.5 in 0.05 increments, allowing the controller to select from states within 10 standard deviations of the mean of the target distribution. This range was selected to ensure that the controller would be able to move the system in any direction that would be logically desirable.

### Objective function

Within this work, we apply a failure avoidance approach within the distribution-based control problem. This requires both a target distribution and a Hellinger distance threshold to be specified. The target distribution we use here is a normal distribution with a mean of 0.0 and a standard deviation of 0.05. As explained in "Failure avoidance control problem", this type of distribution could be applicable for a number of different types of social network control problems. With this target distribution and a specified Hellinger distance threshold,* H*_{max}, the utility function of the overall network state can be defined using Eq. 7, where *T* and *S* represent the target and state distribution, respectively.

$$\begin{aligned} U(t) = \left\{ \begin{array}{l} 1,\quad \text { if } H(T, S) < H_{\text{max}} \\ 0,\quad \text { otherwise} \end{array}\right. \end{aligned}$$

(7)

The goal of the controller, then, is to maximize the utility over time. In other words, the controller must keep the distribution of the network’s state within* H*_{max} distance of the specified target distribution. In the results presented here, the maximum length of a simulation is set at 50,000 steps, at which point it is said that the controller has successfully controlled the system.