Extended methods for influence maximization in dynamic networks

Background The process of rumor spreading among people can be represented as information diffusion in social network. The scale of rumor spread changes greatly depending on starting nodes. If we can select nodes that contribute to large-scale diffusion, the nodes are expected to be important for viral marketing. Given a network and the size of the starting nodes, the problem of selecting nodes for maximizing information diffusion is called influence maximization problem. Methods We propose three new approximation methods (Dynamic Degree Discount, Dynamic CI, and Dynamic RIS) for influence maximization problem in dynamic networks. These methods are the extensions of previous methods for static networks to dynamic networks. Results When compared with the previous methods, MC Greedy and Osawa, our proposed methods were found better than the previous methods: Although the performance of MC greedy was better than the three methods, it was computationally expensive and intractable for large-scale networks. The computational time of our proposed methods was more than 10 times faster than MC greedy, so they can be computed in realistic time even for large-scale dynamic networks. When compared with Osawa, the performances of these three methods were almost the same as Osawa, but they were approximately 7.8 times faster than Osawa. Conclusions Based on these facts, the proposed methods are suitable for influence maximization in dynamic networks. Finding the strategies of choosing a suitable method for a given dynamic network is practically important. It is a challenging open question and is left for our future work. The problem of adjusting the parameters for Dynamic CI and Dynamic RIS is also left for our future work.

From given network, selecting such starting nodes for large-scale information propagation was formalized as "influence maximization problem" by Kempe et al. [1]. The original formalization is for static networks. However, nodes and edges can be newly added or deleted in many real social networks. Therefore, influence maximization problem in dynamic networks should be considered. Habiba et al. defined the problem for dynamic networks [2]. Since the problem was proved to be NP-Hard, computing the best solution in realistic time is computationally intractable. Therefore, many approximation methods based on Monte-Carlo simulation and heuristic methods have been proposed. Methods based on Monte-Carlo simulation are accurate but computationally expensive. On the other hand, heuristic methods are fast but they are less accurate.
In order to find better solutions for the information maximization problem, we propose three new methods for dynamic networks as the extension of the methods for static networks. Dynamic Degree Discount is a heuristic method based on node degree. Dynamic CI is a method based on a node's degree and the degrees of reachable nodes from the node within specific time. Dynamic RIS uses many similar networks generated by random edge removal. We compare the proposed methods with previous methods. The number of propagated nodes based on our method is about 1.5 times of that of previous methods. And computational time of our method is about 7.8 times faster than previous methods.
The authors discuss the extended methods for influence maximization in dynamic networks [3]. In addition to the contents in [3], this paper includes detailed explanation of background knowledge, discussions of the effect of different values of parameters in the proposed methods, and detailed analysis of the advantages and disadvantages of the proposed methods.
The structure of this paper is as follows. "Related work" section shows related work. "Proposed methods" section presents proposed methods (Dynamic Degree Discount, Dynamic CI and Dynamic RIS), "Experiments" section explains our experiments, and "Experimental results" section shows the experimental results. "Discussion" section shows discussions about the experimental results, and "Conclusion" section concludes the paper.

Model of information propagation
We use SI model as the model of information propagation on networks. In SI model, each node in networks is either in state S (susceptible) or in state I (infected). Nodes in state S do not know the information and those in state I know the information. At the beginning of information propagation (at time t = 1 ), a set of nodes in state I is fixed as the seed nodes. For all edges (t, u, v) at time t = 1, 2, . . . , T , the following operations are performed. If node u is in state I and node v in state S, information is propagated from u to v with probability , which means the state of v is changed from S to I at time t + 1 . Probability is the parameter of susceptibility, and it controls the percentage of information propagation. At time t = T + 1 , information propagation is terminated.
Based on the above notations, we can formulate influence maximization problem as follows. We define σ (S) as the expected number of nodes of state I at time T + 1 when information propagation started at time 1 from seed nodes S of state I based on SI model. (Please keep in mind that S in σ (S) is a set of seed nodes, and S in SI model is susceptible state.) Influence maximization problem in a dynamic network is to search for a set of seed nodes S of size k that maximizes σ (S) when a dynamic network G, duration of the network T, susceptibility of SI model , and the size of seed nodes k are given.

Problems related to influence maximization in dynamic networks
There are some problems related to influence maximization in dynamic networks. Instead of giving item (or information) to seed nodes for free, revenue maximization [4] is the problems of finding seed customers (nodes) and offering discounts to them in order to increase total revenue. Although the problem is important in the field of marketing, it is more complicated than influence maximization problem since seed nodes are not treated as equal, and the amount of discount for each node may not be equal. The number of possible parameters increases greatly especially in the case of dynamic networks. Although revenue maximization is one of the important research directions, it is different from influence maximization problem.
Opinion formation [5][6][7] is another problem related to influence maximization problem. Each agent (node) has an opinion which might be a continuous or a discrete quantity. The underlying network represents the society where the agents have interactions. Each agent has an opinion in the society that is influenced by the society. Analyzing the increase and decrease of each opinion is important for modeling the dynamics of opinion formation and for opinion polarization [8].
It is often pointed out that the properties of dynamic networks are quite different from those in static networks. Braha and Bar-Yam [9,10] pointed out the overlap of the centrality in dynamic networks and that in the aggregated (static) network is very small. Hill and Braha [11] propose dynamic preferential attachment mechanism that reproduce dynamic centrality phenomena. Holme presents good surveys on dynamic networks [12,13].

Influence maximization methods for static networks
Jalili presents a survey on spreading dynamics of rumor and disease based on centrality [14]. There are roughly three approaches for influence maximization problem in static networks. The first is Monte-Carlo simulation methods, the second is heuristic-based methods, and the third is the methods to generate a large number of networks with random edge removal and select seed nodes based on the generated networks.
Monte-Carlo simulation method is proposed by Kempe et al. [1]. σ (S) is estimated by repeating Monte-Carlo simulation in Kepme's method. When S is given as a set of seed nodes, simulations of information propagation are repeated R times and the average number of infected nodes is defined as σ (S) . Next, the node v which maximizes the difference σ (S ∪ {v}) − σ (S) is added to seed nodes greedily based on the estimated σ (S) . This operation is repeated until |S| = k.
Since σ (·) is a monotonic and submodular function, when we denote strict solution of seed nodes as S * , the seed nodes obtained by the above greedy algorithm S greedy are proved to satisfy σ (S greedy ) ≥ (1 − 1/e)σ (S * ) [1]. Because of this property, qualities of the solutions by Kempe's method are good. However, more and more repetition of Monte-Carlo simulation is needed in order to estimate σ (S) accurately. Since the computational cost for finding seed nodes with this method is high, it is not possible to find seed nodes in realistic time for large-scale networks.
Heuristic methods are proposed in order to search for seed nodes at high speed. Chen et al. [15] proposes PMIA to find seed nodes focusing on the paths with high information propagation ratio. Jiang et al. [16] proposed SAEDV which searches for seed nodes by annealing method to obtain σ (·) from adjacent nodes in seed nodes. Chen et al. [17] proposed Degree Discount based on node degree where the nodes adjacent to already selected node are given penalty. This is because when node v is selected as one of seed nodes and u is its neighbor, it is highly likely that v propagates information to u, so selecting nodes other than u as seed nodes is better for information diffusion.
Algorithm of Degree Discount is shown as follows. t i in the algorithm shows the penalty of node i. dd i is the degree of node i after giving penalty. dd i is smaller when the value of t i is bigger.
Morone et al. [18] proposed a method for finding seed nodes considering the degrees of distant nodes. The method calculates the following CI l (v) for each node and selects seed nodes based on the values: ∂Ball(v, l) in the above formula represents nodes where the distance from node v is l. The example of CI l (v) is explained in Fig. 1. ∂Ball(v, 2) when l = 2 are two red nodes with distance 2 from node v and the degrees of both nodes are 8. Therefore, The degree of node v itself is low in the network in Fig. 1, but the node v is effective for information propagation because it is connected with some high degree nodes with distance two. This method thus selects seed nodes with wider propagation compared with the cases when seed nodes are selected based on the degree of node v only.
These heuristic methods compute seed nodes faster than the methods based on Monte-Carlo simulation. However, it is experimentally confirmed that the scale of propagation of the methods depends on network structures and parameters.
Ohsaka et al. [19] proposed a method to generate many networks with random edge removal in order to solve this problem. Ohsaka's method is based on "coin flip" mentioned in Kempe's paper [1]. Distribution of nodes where information is propagated from seed nodes S in static network G is set as D G (S) . And distribution of nodes where information is propagated from seed nodes S on network where edges are removed at constant ratio from the network G is set as D ′ G (S) . "Coin flip" means that D G (S) equals to D ′ G (S) in this situation, and that σ (·) can be estimated by generating many networks with edges removed at constant ratio, not by repeating Monte-Carlo simulation. Ohsaka's method estimates σ (·) by acquiring Strongly Connected Component (SCC) in each network generated by RR numbers of networks with edges removed at constant ratio. SCC is a subgraph where each node in the subgraph can be reachable to and from any other nodes.
Borgs et al. [20] and Tang et al. [21] also propose methods similar to Ohsaka's method. The difference from Ohsaka's method is σ (·) , which is not estimated directly from generated networks. Reachable nodes from randomly selected node v are computed, and then seed nodes are selected based on the nodes. More specifically, the algorithm is as follows. There are other approaches for influence maximization problem in different problem settings. Chen et al. [22] proposed a method to solve the problem with time limit. Feng et al. [23] solves the influence maximization problem in a situation where freshness of the information degrades as it spreads. Mihara et al. [24] proposed a method to influence maximization problem where the whole network structure is unknown.

Degrees in dynamic networks
Notations of edges and paths in dynamic networks are the same as the ones in ref. [25]. (t, u, v) represents an edge from node u to v at time t.
Duration of time from the start to the end of a path t k−1 − t 1 is the length of time of the path, and the smallest one is the minimum length of time.
Habita et al. [26] define degrees in dynamic network using symmetric difference of past connections and future connections. However, diffusion in dynamic networks is from past to future only, and it is not bidirectional. We therefore define degree D T (v) of node v in dynamic network as follows: On the other hand, in Fig. 3, nodes adjacent to node A change over time. So the degree of node A is bigger than that in Fig. 2.
In Figs Fig. 3. D T (v) captures the number of newly adjacent nodes, and this is important for influence maximization problem. We therefore employ D T (v) as the definition of node degree in dynamic networks.

Influence maximization methods for dynamic networks
There are two approaches for influence maximization problem in dynamic networks: methods based on Monte-Carlo simulation and heuristic-based methods. The former method is proposed by Habiba and Berger-Wolf [2]. The method estimates the scale of propagation σ (·) by repeating Monte-Carlo simulation just the same as in static networks. Since σ (·) is monotonic and deteriorated modular also in dynamic networks, this method achieves large-scale propagation. However, the computational cost of this method is high as in static networks. Osawa and Murata [25] proposed a heuristic method for calculating σ (·) at high speed. His algorithm for computing σ (S) for seed nodes S is shown as follows.
After σ (S) is computed, seed nodes are obtained by greedy algorithm as in the method by Monte-Carlo simulation. Osawa's method finds seed nodes in realistic computational time. However, the quality of its solution depends on given networks because σ (·) is calculated approximately, and it is worse compared with the solutions by Monte-Carlo simulation.

Proposed methods
We propose new methods for influence maximization problem in dynamic networks in this section. We propose three new methods (Dynamic Degree Discount, Dynamic CI, and Dynamic RIS) which are the extensions of static network methods to dynamic network methods. We use the following notations: G: dynamic network, T: duration of the dynamic network, k: the size of seed nodes, : susceptibility, θ : the number of generated networks, and S: seed nodes.

Dynamic Degree Discount
Dynamic Degree Discount is the extension of Degree Discount by Chen et al. [17]

Dynamic CI
Dynamic CI is an extension of Morone's method [18] for dynamic networks. Morone's method focuses on the degree of node v and the degrees of nodes with distance l from v. Dynamic CI defines an index D_CI l (v) in which degree and distance are extended to dynamic networks.
The differences between CI l (v) and D_CI l (v) are as follows: (1) the definition of degree is changed to that for dynamic networks and (2) ∂Ball(v, l) in CI l (v) is changed to DBall(v, l). DBall(v, l) represents nodes where their shortest duration of time (mentioned in "Model of information propagation" section) from node v is l. l is a parameter which takes the value within the range 1 ≤ l ≤ T . In the algorithm of Dynamic CI, D_CI l (v) is computed for each node and top k nodes are selected as seed nodes.

Dynamic RIS
Dynamic RIS is an extension of Borgs's method [20] and Tang's method [21] for dynamic networks.
The difference between Borgs's and Tang's algorithm and Dynamic RIS is where RR in their algorithm is set as RR(v, d) in our algorithm. RR(v, d) is a set of all nodes that are reachable to v within the shortest duration of time d in all durations of dynamic networks, which is defined as follows: ) is a set of nodes which are reachable to "node v at time t" within the shortest period of d.
The computational complexities of these methods are as follows.

Dynamic Degree Discount
According to the paper of Chen et al. [17], the computational complexity of Degree Discount is O(k · logn + m) , where k is the number of seed nodes, n is the number of nodes, and m is the number of edges, respectively. Dynamic Degree Discount is an extension of Degree Discount. Static degree is replaced with dynamic one ( D T (i) ) and Static neighbors is replaced with dynamic one ( N T (v) ). Computational complexity for dynamic degree and dynamic neighbors is T ·m n , where T is the total duration of time of given dynamic network. Therefore, the total computational complexity of Dynamic Degree Discount is O(k · logn + m + T ·m n ).

Dynamic CI
According to the paper of Morone and Makse [18], the computational complexity of CI is O(n · logn) , where n is the number of nodes. Dynamic CI is an extension of CI. Static degree is replaced with dynamic one ( D T (i) ), and its computational complexity is T ·m n , where T is the total duration of time of given dynamic network. Therefore, the total computational complexity of Dynamic CI is O(n · logn + T ·m n ).

Dynamic RIS
According to the paper of Tang et al. [21], the computational complexity of RIS is O(k · l 2 (m + n)log 2 n/ǫ 3 ) which returns (1 − 1 e − ǫ)-approximate solution with at least 1 − n −l probability, where l and ǫ are the constants. Computational complexity of Dynamic RIS heavily depends on the parameters θ and d, which are the number of generated networks and the duration of time for computing RR(v, d), respectively. Therefore, the total computational complexity of Dynamic RIS is O(θ · d · k · l 2 (m + n)log 2 n/ǫ 3 ).

Experiments
We perform experiments for comparing the proposed methods with previous ones in order to confirm their effectiveness. Dynamic networks used for the experiments are shown in Table 1. These networks are the same as the ones used in previous research.
Average degree in Table 1 shows the average of all nodes in the network, which is 1 |V | v∈V D T (v) . Hospital [27] is a network about contacts of patients and medical staffs at hospital with time. Primary School [28,29] is a network about contacts of students and teachers at school. High School 2013 [30] is a network of contacts of students. The unit of the duration in these three datasets is 20 s. Each dataset is available at SocioPatterns (http://www.socio patte rns.org).
Methods used in the experiments are previous two methods (Monte-Carlo simulation (MC Greedy) and Osawa) for dynamic network explained in "Influence maximization methods for dynamic networks" section and our proposed methods (Dynamic Degree Discount, Dynamic CI, and Dynamic RIS) in "Proposed methods" section. Given a network as input, each method computes seed nodes S. The simulation of influence maximization based on SI model is repeated R times with the obtained seed nodes and set the average of the number of nodes in state I as σ (S) . The values of σ (S) are compared in order to evaluate the methods.
Experiments are performed for the following purposes: (1) Comparison of σ (S) when the size of seed nodes k changes. CELF [31] is used to speedup the experiments when greedy algorithms are used in MC Greedy and Osawa. CELF is an algorithm used when the greedy algorithm is applied to the problem with inferior modularity, and the solution is the same as in normal greedy algorithm. According to the experiments by Lescovec [31], computational time is 700 times faster than normal greedy algorithm when CELF is used.

Comparison of σ (S) when the size of seed nodes k changes
The results of information propagation for each size of seed nodes k with fixed susceptible = 0.01 of SI model are shown in Fig. 4.   There is not much difference in the scale of diffusion among each of the three proposed methods. Dynamic RIS achieves the highest in High School 2013, for example, but the difference among proposed methods is small compared with the difference between proposed methods and previous methods (MC Greedy and Osawa). Figure 5 shows diffusion when the size of seed nodes is fixed as 20% of all nodes in the networks and susceptibility is changed as = 0.001, 0.01, 0.05 . The x axis shows the value of , and the y axis shows the percentage of diffusion. Parameters l and d are the same as the ones used in the previous experiments. As shown in Fig. 5, MC Greedy achieves the highest diffusion regardless of the value of . The difference among three proposed methods is small.

Comparison of σ (S) when susceptibility changes
As the result of comparison with proposed methods and Osawa, our proposed methods achieve higher scale of diffusion than Osawa in Hospital and High School 2013 when = 0.05 . Osawa achieves higher diffusion than Dynamic RIS only in Primary School. When = 0.001 , the difference between proposed methods and Osawa is very small compared with the cases of other values.  Figure 6 shows the computational time when is set as = 0.01 and the sizes of seed nodes are changed. A PC of Intel Core i7(3.4 GHz) CPU and 8 GB memory is used for the experiments. X axis shows the percentage of seed nodes, and y axis shows the computational time (log-scale). Figure 6 shows that for all datasets, methods other than MC Greedy can compute seed nodes in realistic time. MC Greedy needs several hours to compute seed nodes. This shows that MC Greedy is intractable in realistic time for large-scale networks.

Comparison of computational time when the size of seed nodes k changes
Regarding the comparison among three proposed algorithm, computational time of Dynamic Degree Discount and Dynamic CI is almost the same in all datasets. Dynamic RIS is about the same computational time as the other two proposed methods in Hospital, and is faster in Primary School and High School 2013. Regarding the

Parameters of Dynamic CI and Dynamic RIS
Diffusions of proposed methods with different parameters are shown in this section. We change parameters l of Dynamic CI, and θ and d in Dynamic RIS.

Diffusion and computational time of different l in Dynamic CI
Diffusion and computational time when l in Dynamic CI changes to 1, 5, 10, 20 are shown in Fig. 7. Left line graphs show the size of diffusion when l is changed in each network. Right bar graphs show computational time. Left line graphs show that diffusion depends on the value of l. Therefore, it is important to find appropriate l in Dynamic

Analysis focused on expansion of each node
In the experiments when susceptibility changes in "Advantages and disadvantages of each of the proposed methods" section, the difference between the proposed methods and Osawa was small when = 0.001 compared with the experiments with other values of . When = 0.05 , Osawa outperforms proposed methods only in Primary School. This section discusses these two points. Figure 10 shows the distribution of diffusion σ ({v}) of each node v when Monte-Carlo simulation is used. X axis shows the percentage of diffusion from node v to the whole network ( σ ({v}) ), and Y axis shows the frequency of the nodes with each of the percentage in X axis. When = 0.001 , almost all nodes are less than 5% of diffusion in all networks. This means that there is no big difference of the diffusion from different seed nodes. This is the reason why the difference between proposed methods and Osawa is small in the experiment in "Advantages and disadvantages of each of the proposed methods" section. On the contrary, there are many nodes with more than 60% of diffusion in Primary School when = 0.05 compared with other two networks. In this case, large-scale diffusion is easy to be achieved even if the most appropriate seed nodes are not selected. This is the reason why Osawa outperforms proposed method in Primary School in "Advantages and disadvantages of each of the proposed methods" section.

Advantages and disadvantages of each of the proposed methods
Advantages and disadvantages of each of the proposed methods are discussed in this section. An advantage of Dynamic Degree Discount is that it contains no parameter, so there is no need to adjust parameter. Its disadvantage is that it is only for SI model, so the method cannot be used for other models. This is because Dynamic Degree Discount is an extension of Chen's Degree Discount which is for SI model. There are other information propagation models such as LT model and Triggering models proposed by Kempe et al. Dynamic Degree Discount cannot be applied to such models.
An advantage of Dynamic CI is that it can be applied to many information propagation models in contrast to Dynamic Degree Discount because Dynamic CI uses only degree information when it calculates seed nodes. Its disadvantage is that the ability of diffusion depends on the value of parameter l as mentioned in "Diffusion and computational time of different l in Dynamic CI" section. It is necessary to search for appropriate values of l for Dynamic CI. The parameter l takes the value within the range 1 < l < T , so the search takes time in general.
An advantage of Dynamic RIS is that its computational time is short. As shown in the experimental results, its computational time is shorter than other methods in all networks except Hospital. As the method can be applied to large networks due to its short computational time, this is a big advantage. Disadvantage of Dynamic RIS is that it needs to adjust parameters θ and d. As mentioned in the previous section, computational time becomes bigger as the parameter θ becomes bigger, and the scale of diffusion becomes smaller for too small θ . Therefore, it is necessary to set appropriate value for θ . However, parameter sensitivity of θ and d is not so much compared with the sensitivity of l in Dynamic CI.

Conclusion
We propose three new methods for influence maximization problem in dynamic networks which are the extensions of the methods for static networks. As the result of experiments for comparing with previous methods, MC Greedy and Osawa, our three proposed methods are better than previous methods in the following sense. Although the performance of MC greedy is better than these three methods, it is computationally expensive and intractable for large-scale networks. The computational time of our proposed methods is more than 10 times faster than MC greedy, so they can be computed in realistic time even for large-scale dynamic networks. When compared with Osawa, the performances of these three methods are almost the same as Osawa, but they are approximately 7.8 times faster than Osawa. Based on these facts, the proposed methods are suitable for influence maximization in dynamic networks.
The comparison of Dynamic Degree Discount, Dynamic CI, and Dynamic RIS is as follows. The choice of the methods should be done based on the following pros and cons.
Dynamic Degree Discount • It requires no parameter.
• It is applicable to SI model only.

Dynamic CI
• It is applicable to other information propagation models.
• The performance heavily depends on parameter l.

Dynamic RIS
• It is relatively fast among these three methods.
• It requires two parameters to be adjusted ( θ and d).
Finding the strategies of choosing suitable method for given dynamic network is practically important. It is a challenging open question and is left for our future work. The problem of adjusting the parameters for Dynamic CI and Dynamic RIS is also left for our future work.