 Research
 Open access
 Published:
Influencebased community partition for social networks
Computational Social Networks volumeÂ 1, ArticleÂ number:Â 1 (2014)
Abstract
Background/Purpose
Community partition is of great importance in sociology, biology and computer science. Due to the exponentially increasing amount of social network applications, a fast and accurate method is necessary for community partition in social networks. In view of this, we investigate the social community partition problem from the perspective of influence propagation, which is one of the most important features of social communication.
Methods
We formulate social community partition as a combinatorial optimization problem that aims at partitioning a social network into K disjoint communities such that the sum of influence propagation within each community is maximized. When K=2 we develop an optimal algorithm that has a provable performance guarantee for a class of influence propagation models. For general K, we prove that it is \mathcal{N}\mathcal{P}hard to find a maximum partition for social networks in the wellknown linear threshold and independent cascade models. To get nearoptimal solutions, we develop a greedy algorithm based on the optimal algorithm. We also develop a heuristic algorithm with a low computational complexity for large social networks.
Results
To evaluate the practical efficiency of our algorithms, we do a simulation study based on real world scenarios. The experiments are conducted on three realworld social networks, and the experimental results show that more accurate partitions according to influence propagation can be obtained using our algorithms rather than using some classic community partition algorithms.
Conclusions
In this study, we investigate the community partition problem in social networks. It is formulated as an optimization problem and investigated both theoretically and practically. The results can be applied to find communities in social networks and are also useful for the influence propagation problem in social networks.
1 Background
1.1 Motivation
Social network is an interdisciplinary research area which has attracted a lot of attention in recent years. One important problem in social networks is community partition that provides the insight of the relationships and attributes of the users that a social network comprises. Generally, a social network can be modeled as a graph in which the nodes represent the users and the edges represent the relationships among the users. The objective of community partition is to cluster the users into groups according to their graph topology [1][8]. Another important problem in social networks is influence propagation. It is one of the most important features of social communication and plays a significant role in a variety of affairs such as diffusion of medical innovations and popularization of new technologies. For example, the influence maximization problem, with the objective of finding a small set of users in a social network as seeds to trigger a large influence propagation, has wide applications in viral marketing [9][13].
Due to the nondeterminacy of human behaviors, the influence propagation is mostly studied in probabilistic models such as the Linear Threshold (LT) model and Independent Cascade (IC) model [14][16], that is, the behaviors and decisions of users are uncertain and depend on the behaviors of others. For example, a userâ€™s adoption of a new product may have impacts on their friends, whose adoptions may further influence others. Therefore, probabilistic models are more suitable than deterministic models for simulating an influence propagation in social networks. Unfortunately, one important issue however is that the expected influence propagation through the entire social network is hard to estimate for most probabilistic models such as LT and IC [15],[16]. Therefore, many works (e.g., [15][17]) construct a local area for each user and use the local influence propagation instead of the global one. But in some large social networks, there may be millions of users so that it is impossible to construct local areas for all the users.
There are also many works studying communitybased algorithms for influence maximization, assuming that influence propagates rarely across different communities. However, based on our observation, there are few works done on community partition aiming specially at influence propagation in social networks. The performance of communitybased algorithms cannot be guaranteed unless there exists an accurate influencebased community partition. In this paper, we investigate the problem inherent in the question that how to partition a social network into disjoint communities in terms of influence propagation. We believe this study is useful for the influence maximization problem and possibly activates further research and potential applications of community in social networks.
1.2 Related work
Community partition is of great importance not only for social networks but also for areas such as computer networks and biology networks. There are lots of works done on community partition in general networks (e.g., [6],[8],[18],[19]), and much effort has been devoted to formalizing the intuition that a community is a set of nodes having more connections with each other while fewer connections with the remainder of the network. The first investigation for community partition were done by Weiss et al. [20]. For subsequent approaches, there are mainly four categories: hierarchybased methods [1],[2], spectrumbased methods [3],[4], densitybased methods [5] and modularitybased methods [6][8],[21][29]. Particularly, Newmanâ€™s notion of modularity [6],[8], which considers the internal connectivity with reference to a randomized model, has been a very popular measure for community partition in general networks. In spite of the excellent performance on many realworld networks, this family of approaches usually has â€˜resolution limitâ€™ problems, i.e., modularitybased methods favor larger communities and fail to discover communities of small sizes [25],[30]. Therefore some works investigate new methods for detecting communities, such as the selfreference methods and the comparative methods [18]. In addition, in [19], Hu et al. proposed an algorithm from the nodeâ€™s point of view to incorporate nodes into a community with the largest attractive force. In [31], Zhang et al. proposed an algorithm from the aspect of combinatorial optimization to partition nodes into disjoint parts. There are also many works which view communities from different perspectives. To learn more about the large body of works in community partition, please refer to [29],[32][37].
Besides community partition, influence propagation is also an important issue in social networks. Domingos and Richardson in [13] and [12] first proposed general descriptive models for influence propagation in social networks. In [14], Kempe et al. formulated the influence propagation as an optimization problem, namely, influence maximization. They proved that the greedy algorithm has a provable performance guarantee for the LT and IC models. However, how to evaluate the expected influence propagation for selecting the nodes with the maximum marginal gain was left as an open problem, and the greedy algorithm in [14] was implemented by Monte Carlo (MC) simulation. After that many researchers started to investigate how to compute the influence propagation efficiently and a large volume of methods (e.g., [15],[16],[38]) have been proposed for the LT and IC models. Meanwhile, there are also many works investigating new influence propagation models (e.g., [39],[40]) to approach the realworld scenarios.
Due to the nature of the communities, applying the research of community partition into influence propagation is promising. In [17], Wang et al. proposed a communitybased greedy algorithm for mining the most influential nodes. In [41], Li et al. further proposed an algorithm for influence maximization in online social networks. They assume that each nodeâ€™s influence propagation is limited to the community it resides and thus they evaluate the influence propagation within each community to improve the computational efficiency. There are also many works for influence propagation or other social network applications taking the advantage of community structures (please see e.g., [42][45] for recent works).
1.3 Our contribution
Although there are a lot of works done on general community partition, based on our observation, there are few works done on community partition for influence propagation. In view of this, we investigate how to partition a social network into communities according to influence propagation. Our main contributions are as follows:

1.
We formally define the influencebased community partition problem as a combinatorial optimization problem with the objective of partitioning a social network into K disjoint communities such that the sum of influence propagation within each community is maximized. We call the problem Maximum KCommunity Partition (MK CP). The motivation is to keep as much influence propagation as possible after the partition and reduce the estimation errors caused using local influence propagation increased of the global one.

2.
When K=2, i.e., partition a social network into two disjoint parts, we develop an optimal algorithm for a class of influence propagation models. For general K, we prove there exists no polynomial time algorithm unless \mathcal{P}=\mathcal{N}\mathcal{P} for MK CP in the wellknown LT and IC models, and a greedy algorithm based on the two partition algorithm is exhibited. We also develop a fast heuristic algorithm with a low computational complexity in case that the social network is very large.

3.
We conduct simulation on realworld social networks to demonstrate the practical efficiency of the proposed algorithms. The influence propagation is based on the wellknown LT and IC models, and the experimental results show that significantly better partitions can be obtained using our algorithms rather than using some community partition methods that are not specialized for influence propagation.
1.4 Paper organization
The rest of this paper is organized as follows. In â€˜Problem descriptionâ€™ section, we give the background information, including the notation and problem definition. In â€˜Methodsâ€™ section, we present our algorithms as well as the theoretical analysis of both the proposed algorithms and the MK CP problem. In â€˜Results and discussionâ€™ section, we show the simulation results on some realworld social networks. In â€˜Conclusionsâ€™ section, we conclude the paper.
2 Problem description
In this study, we formulate a social network as a simple directed graph without selfloops, where nodes represent users and edges represent relationships among the users. We first introduce some notations and then present the MK CP problem based on the notations.

1.
For a social network G, we denote by V={1,2,â€¦,n} the set of nodes and E={(i,j)} the set of directed edges. A directed edge (i,j) denotes that there exists a chance of influence propagation between nodes i and j where i is the sender and j is the receiver. For each node iâˆˆV, we denote by p(i) (0â‰¤p(i)â‰¤1) the probability that node i would produce an influence propagation or would share an idea with others through the social network. For example, in the Twitter social network, p(i) should be related to the number of tweets i posts periodically. For each edge (i,j)âˆˆE, we denote by w(i,j) the influential degree from node i to node j, which depends on their closeness and the probability p(i) for node i.

2.
Let K denote the number of communities. We denote by c _{ i }âˆˆ{1,2,â€¦,K} the community identifier of node i. We denote by C _{ k }={ic _{ i }=k} the set of nodes with community identifier k (1â‰¤kâ‰¤K). For each pair of nodes i and j in the same set C _{ k }, we denote by {p}_{{C}_{k}}(i,j) (0\xe2\u2030\xa4{p}_{{C}_{k}}(i,j)\xe2\u2030\xa41) the probability that node j receives the influence from node i through propagation within community C _{ k }.

3.
For a community C _{ k } and a node iâˆˆC _{ k }, we denote by {\mathrm{\xcf\u0192}}_{{C}_{k}}\left(i\right) the influence propagation of node i within community C _{ k }, i.e., {\mathrm{\xcf\u0192}}_{{C}_{k}}\left(i\right)=\underset{j\xe2\u02c6\u02c6({C}_{k}\xe2\u02c6\u2013i)}{\xe2\u02c6\u2018}{p}_{{C}_{k}}(i,j). For any nonempty subset DâŠ†C _{ k }, we denote by {\mathrm{\xcf\u0192}}_{{C}_{k}}\left(D\right), the sum of influence propagation within community C _{ k } for every node in D, i.e., {\mathrm{\xcf\u0192}}_{{C}_{k}}\left(D\right)=\underset{i\xe2\u02c6\u02c6D}{\xe2\u02c6\u2018}{\mathrm{\xcf\u0192}}_{{C}_{k}}\left(i\right). For simplicity, we let Ïƒ(X) denote Ïƒ _{ X }(X) for community X and in the rest of this paper we call Ïƒ(Â·) the influence propagation function for community â€˜ Â·â€™.
The probability that node j receives the influence from node i not only depends on the influential degree w(i,j) but also depends on the network topology and the influence propagation model. For example, in the LT model, the sum of influence node j receives can be formulated as \underset{i\xe2\u02c6\u02c6{N}_{\text{active}}\left(j\right)}{\xe2\u02c6\u2018}w(i,j) where N_{active}(j) denotes the set of active nodes around j and \underset{i\xe2\u02c6\u02c6{N}_{\text{active}}\left(j\right)}{\xe2\u02c6\u2018}w(i,j)\xe2\u2030\xa41. The influence propagation runs in discrete steps. At any time t, a node jâˆˆV becomes active when \underset{i\xe2\u02c6\u02c6{N}_{\text{active}}\left(j\right)}{\xe2\u02c6\u2018}w(i,j)\xe2\u2030\yen \mathrm{\xce\xbb}\left(j\right) where Î»(j) is a threshold selected uniformly at random between 0 and 1. Therefore in the LT model, for any community C_{ k }, {p}_{{C}_{k}}(i,j) is the probability that j is eventually active when i is initially active. As an example shown in Figure 1, the numbers on the edges and nodes denote the influential degrees and random thresholds. Assume that all the nodes are in the same community and node u is a seed, then all the white nodes (including node y) can be activated by node u, because they can either be activated by u or by paths from u. All the black nodes (p, q and w) cannot be activated by node u, even though q is a direct outgoing neighbor of u. Therefore in the LT model, {p}_{{C}_{k}}(i,j) not only depends on the influential degree w(i,j). We next present the definitions of Kvalid disjoint partition (KVDP) and the MK CP problem.
Definition 1
(KVDP). Given a graph G(V,E) as a social network, a Kvalid disjoint partition is a collection of K sets {C_{1}, C_{2}, â€¦, C_{ K }} satisfying: (1) \underset{k=1}{\overset{K}{\xe2\u2039\u0192}}\left({C}_{k}\right)=V and (2) âˆ€iâ‰ j, C_{ i }âˆ©C_{ j }=âˆ….
Let K be an integer no less than 2. According to Definition 1, a KVDP is a partition of V into K nonempty subsets such that each node is in exact one subset. We denote the influence propagation function for a KVDP {C_{1}, C_{2}, â€¦, C_{ K }} by f({C}_{1},\phantom{\rule{1em}{0ex}}{C}_{2},\phantom{\rule{1em}{0ex}}\xe2\u20ac\xa6\phantom{\rule{1em}{0ex}},\phantom{\rule{1em}{0ex}}{C}_{K})=\underset{k=1}{\overset{K}{\xe2\u02c6\u2018}}\mathrm{\xcf\u0192}\left({C}_{k}\right) and we want to maximize f(C_{1}, C_{2}, â€¦, C_{ K }). The formal definition of MK CP is given in Definition 2.
Definition 2.
(MKCP). Given a graph G as a social network, an influence propagation model (such as IC or LT) and an integer Kâ‰¥2, Maximum KCommunity Partition (MK CP) is the problem of finding a partition \mathcal{P}=\{{C}_{1},\phantom{\rule{1em}{0ex}}{C}_{2},\xe2\u20ac\xa6,\phantom{\rule{1em}{0ex}}{C}_{K}\} of K subsets of nodes,
Consider the node set V as a single community, we have
It is clear that when partitioning the social network into two or more communities, some pairs (i,j) will be separated and thus both p_{ V }(i,j) and p_{ V }(j,i) have to be removed in the sum of influence propagation. In addition, even though nodes i and j are partitioned into the same community X, p_{ X }(i,j) may be less than p_{ V }(i,j), and p_{ X }(j,i) may be less than p_{ V }(j,i) because X is a subset of V. Therefore, the influential propagation between any pair of nodes i and j is different for different community partitions no matter they are in the same community or not.
3 Methods
3.1 Optimal algorithm for M2CP
In this subsection, we present an optimal algorithm to M2CP for a class of influence propagation models. The algorithm is based on the Min Cut algorithm proposed in [46]. Before giving the formal algorithm and its theoretical analysis, we briefly discuss the difference between the Min Cut problem and the M2CP problem. A min cut of a graph G is a set of edges with the least number of elements (unweighted case) or the least sum of weights (weighted case) that partitions G into two parts. On this basis, for M2CP, one may want to find a cut to minimize the influence propagation leaking out between the two parts. However, maximizing the sum of influence propagation within each community is not equivalent to minimizing the influence propagation crossing different communities. Figure 2 shows an example. There are eight nodes which are partitioned into two communities C_{1}={1,2,3,4} and C_{2}={5,6,7,8}. Assume the graydirected arcs are the possible influence propagation. Consider nodes 7, 5, and 1, respectively. It is clear that the influence received by nodes 7 and 5 will decrease after the partition because node 3 cannot influence node 7 and it cannot influence node 5 via node 7 indirectly. The influence received by node 1 also decreases because of the following: (1) node 5 cannot influence node 1, (2) node 7 cannot influence node 1 indirectly, and (3) node 3 cannot influence node 1 through the path (3â†’7â†’5â†’1). The first two kinds of influence propagation are between nodes in different communities, but the last one is between nodes in the same community. Therefore, maximizing the sum of influence propagation within each community is not just minimizing the influence propagation crossing different communities.
Given a social network as well as an influence propagation model, our algorithm iteratively finds nâˆ’1 partitions and selects the one with the maximum value as the final output. In the beginning, we consider each node i as a single set and let \mathcal{V}=\{{S}_{1},{S}_{2},\xe2\u20ac\xa6,{S}_{n}\} as the collection of all the sets where S_{ i }={i}. Select an arbitrary set {S}_{i}\xe2\u02c6\u02c6\mathcal{V} and let \mathcal{A}=\left\{{S}_{i}\right\}. We then add the remainder sets one by one iteratively into . Each time a set S_{ j } with the maximum value of \mathrm{\xcf\u201a}(\mathcal{A},{S}_{j}) is added, where \mathrm{\xcf\u201a}(\mathcal{A},{S}_{j})=\mathrm{\xcf\u0192}(\mathcal{A}\xe2\u02c6\xaa{S}_{j})\xe2\u02c6\u2019\mathrm{\xcf\u0192}\left({S}_{j}\right). When there are only one set S_{ l } left, \left\{v\right(\mathcal{A}),v(\mathcal{V}\xe2\u02c6\u2013\mathcal{A}\left)\right\} are considered as the first partition where v\left(\mathcal{X}\right) is defined as the set of nodes in . In addition, the last two sets not in , say S_{ r } and S_{ l }, are merged as a single set (S_{ r }âˆªS_{ l }) for computing the next partition. The algorithm terminates when there are only one set in . The pseudocode is given in Algorithm 1.
The computational complexity of AM2CP (Algorithm 1) depends on the time complexity of computing Ïƒ(Â·), which further depends on the time complexity of computing the influence propagation {p}_{{C}_{k}}(i,j) for community C_{ k } and all the pairs (i,j) of nodes in it. In [15], Chen et al. prove that it is \#\mathcal{P}hard to compute the exact influence propagation in LT and IC models. Therefore, in this work, {p}_{{C}_{k}}(i,j) is estimated by MC simulation. Assume we have a simulator to estimate Ïƒ(Â·) in Ï„ time. Following Algorithm 1, we run steps (3 to 11) nâˆ’1 times for the nâˆ’1 partitions. For each partition, we add all the sets greedily into that calls the function \mathrm{\xcf\u0192}(\xc2\xb7)\mathcal{O}\left({n}^{2}\right) times. Therefore, the overall running time of AM2CP is \mathcal{O}\left({n}^{3}\mathrm{\xcf\u201e}\right).
We next show that AM2CP is an optimal solution for M2CP when the community influence propagation function Ïƒ(Â·) is supermodular. Let S be a finite set. A function f:2^{S}â†’R is supermodular if for any BâŠ‚AâŠ‚S and uâˆ‰A,
or equivalently for any B,AâŠ‚S,
Theorem 1
If the influence propagation function Ïƒ(Â·) is supermodular, AM2CP is an optimal solution for M2CP.
Proof
Based on AM2CP, each time we find a partition \mathcal{P}=\left(v\right(\mathcal{A}),v(\mathcal{V}\xe2\u02c6\u2013\mathcal{A}\left)\right) that separates the last two sets S_{ r } and S_{ l }, and we merge the two sets for the next round. To show Theorem 1, it is sufficient to show that has the maximum objective function value \mathrm{\xcf\u0192}\left(v\right(\mathcal{A}\left)\right)+\mathrm{\xcf\u0192}\left(v\right(\mathcal{V}\xe2\u02c6\u2013\mathcal{A}\left)\right) among all the partitions separating S_{ r } and S_{ l }, where v\left(\mathcal{X}\right) is the set of nodes in . We prove it by induction.
Without loss of generality, we assume the sets added into are in the order: {S}_{{i}_{1}},{S}_{{i}_{2}},\xe2\u20ac\xa6,{S}_{{i}_{\left\mathcal{V}\right}} for round i and let {\mathcal{A}}_{{i}_{j}} denote the collection of the first j sets added into in round i. Then for any \mathcal{S}\xe2\u0160\u2020{\mathcal{A}}_{{i}_{1}} and {S}_{{i}_{j}} with j>2, we have \mathrm{\xcf\u0192}\left(v\right({\mathcal{A}}_{{i}_{2}}\left)\right)+\mathrm{\xcf\u0192}\left({S}_{{i}_{j}}\right)\xe2\u2030\yen \mathrm{\xcf\u0192}\left(v\right({\mathcal{A}}_{{i}_{2}}\xe2\u02c6\u2013\mathcal{S}\left)\right)+\mathrm{\xcf\u0192}({S}_{{i}_{j}}\xe2\u02c6\xaav(\mathcal{S}\left)\right) because v\left(\mathcal{S}\right) is either {S}_{{i}_{1}} or âˆ…. Assume \mathrm{\xcf\u0192}\left(v\right({\mathcal{A}}_{{i}_{{k}^{\xe2\u20ac\xb2}}}\left)\right)+\mathrm{\xcf\u0192}\left({S}_{{i}_{j}}\right)\xe2\u2030\yen \mathrm{\xcf\u0192}\left(v\right({\mathcal{A}}_{{i}_{{k}^{\xe2\u20ac\xb2}}}\xe2\u02c6\u2013\mathcal{S}\left)\right)+\mathrm{\xcf\u0192}({S}_{{i}_{j}}\xe2\u02c6\xaav(\mathcal{S}\left)\right) for any 2â‰¤k^{â€²}<k, \mathcal{S}\xe2\u0160\u2020{\mathcal{A}}_{{i}_{{k}^{\xe2\u20ac\xb2}\xe2\u02c6\u20191}} and {S}_{{i}_{j}} with j>k^{â€²}. We next show that \mathrm{\xcf\u0192}\left(v\right({\mathcal{A}}_{{i}_{k}}\left)\right)+\mathrm{\xcf\u0192}\left({S}_{{i}_{j}}\right)\xe2\u2030\yen \mathrm{\xcf\u0192}\left(v\right({\mathcal{A}}_{{i}_{k}}\xe2\u02c6\u2013\mathcal{S}\left)\right)+\mathrm{\xcf\u0192}({S}_{{i}_{j}}\xe2\u02c6\xaav(\mathcal{S}\left)\right) for any \mathcal{S}\xe2\u0160\u2020{\mathcal{A}}_{{i}_{k\xe2\u02c6\u20191}} and {S}_{{i}_{j}} with j>k.
Consider the following two cases: (1) {S}_{{i}_{k\xe2\u02c6\u20191}}\xe2\u02c6\u02c6\mathcal{S} and (2) {S}_{{i}_{k\xe2\u02c6\u20191}}\xe2\u02c6\u2030\mathcal{S}. When {S}_{{i}_{k\xe2\u02c6\u20191}}\xe2\u02c6\u2030\mathcal{S}, we have \mathrm{\xcf\u0192}\left(v\right({\mathcal{A}}_{{i}_{k\xe2\u02c6\u20192}}\left)\right)+\mathrm{\xcf\u0192}\left({S}_{{i}_{j}}\right)\xe2\u2030\yen \mathrm{\xcf\u0192}\left(v\right({\mathcal{A}}_{{i}_{k\xe2\u02c6\u20192}}\xe2\u02c6\u2013\mathcal{S}\left)\right)+\mathrm{\xcf\u0192}({S}_{{i}_{j}}\xe2\u02c6\xaav(\mathcal{S}\left)\right) due to the assumption. Therefore, \mathrm{\xcf\u0192}\left(v\right({\mathcal{A}}_{{i}_{k}}\left)\right)+\mathrm{\xcf\u0192}\left({S}_{{i}_{j}}\right)\xe2\u2030\yen \mathrm{\xcf\u0192}\left(v\right({\mathcal{A}}_{{i}_{k}}\xe2\u02c6\u2013\mathcal{S}\left)\right)+\mathrm{\xcf\u0192}({S}_{{i}_{j}}\xe2\u02c6\xaav(\mathcal{S}\left)\right) because (1) v\left({\mathcal{A}}_{{i}_{k}}\right)=v({\mathcal{A}}_{{i}_{k}}\xe2\u02c6\u2013\mathcal{S})\xe2\u02c6\xaav\left({\mathcal{A}}_{{i}_{k\xe2\u02c6\u20192}}\right), (2) v({\mathcal{A}}_{{i}_{k\xe2\u02c6\u20192}}\xe2\u02c6\u2013\mathcal{S})=v({\mathcal{A}}_{{i}_{k}}\xe2\u02c6\u2013\mathcal{S})\xe2\u02c6\copyright v\left({\mathcal{A}}_{{i}_{k\xe2\u02c6\u20192}}\right) and (3) Ïƒ(Â·) is supermodular.
When {S}_{{i}_{k\xe2\u02c6\u20191}}\xe2\u02c6\u02c6\mathcal{S}, we have \mathrm{\xcf\u0192}\left(v\right({\mathcal{A}}_{{i}_{k\xe2\u02c6\u20191}}\left)\right)+\mathrm{\xcf\u0192}\left({S}_{{i}_{k}}\right)\xe2\u2030\yen \mathrm{\xcf\u0192}\left(v\right(\mathcal{S}\left)\right)+\mathrm{\xcf\u0192}({S}_{{i}_{k}}\xe2\u02c6\xaav({\mathcal{A}}_{{i}_{k\xe2\u02c6\u20191}}\xe2\u02c6\u2013\mathcal{S}\left)\right) due to the assumption in which \mathrm{\xcf\u0192}\left(v\right(\mathcal{S}\left)\right)=\mathrm{\xcf\u0192}\left(v\right({\mathcal{A}}_{{i}_{k\xe2\u02c6\u20191}})\xe2\u02c6\u2013v({\mathcal{A}}_{{i}_{k\xe2\u02c6\u20191}}\xe2\u02c6\u2013\mathcal{S}\left)\right). Since Ïƒ(Â·) is supermodular, we have \mathrm{\xcf\u0192}\left(v\right({\mathcal{A}}_{{i}_{k\xe2\u02c6\u20191}})\xe2\u02c6\xaa{S}_{{i}_{j}})\xe2\u02c6\u2019\mathrm{\xcf\u0192}\left(v\right({\mathcal{A}}_{{i}_{k\xe2\u02c6\u20191}}\left)\right)\xe2\u2030\yen \mathrm{\xcf\u0192}\left(v\right(\mathcal{S})\xe2\u02c6\xaa{S}_{{i}_{j}})\xe2\u02c6\u2019\mathrm{\xcf\u0192}\left(v\right(\mathcal{S}\left)\right). In sum, we have \mathrm{\xcf\u0192}\left(v\right({\mathcal{A}}_{{i}_{k}}\xe2\u02c6\u2013\mathcal{S}\left)\right)+\mathrm{\xcf\u0192}({S}_{{i}_{j}}\xe2\u02c6\xaav(\mathcal{S}\left)\right)\xe2\u2030\xa4\mathrm{\xcf\u0192}\left(v\right({\mathcal{A}}_{{i}_{k\xe2\u02c6\u20191}})\xe2\u02c6\xaa{S}_{{i}_{j}})+\mathrm{\xcf\u0192}\left({S}_{{i}_{k}}\right). In addition we have \mathrm{\xcf\u0192}\left(v\right({\mathcal{A}}_{{i}_{k\xe2\u02c6\u20191}})\xe2\u02c6\xaa{S}_{{i}_{j}})+\mathrm{\xcf\u0192}\left({S}_{{i}_{k}}\right)\xe2\u2030\xa4\mathrm{\xcf\u0192}\left(v\right({\mathcal{A}}_{{i}_{k}}\left)\right)+\mathrm{\xcf\u0192}\left({S}_{{i}_{j}}\right) because in AM2CP, {S}_{{i}_{k}}={\text{argmax}}_{{S}_{z}\xe2\u02c6\u02c6\mathcal{V}\xe2\u02c6\u2013{\mathcal{A}}_{{i}_{k\xe2\u02c6\u20191}}}\left(\mathrm{\xcf\u0192}\right({\mathcal{A}}_{{i}_{k\xe2\u02c6\u20191}}\xe2\u02c6\xaa{S}_{z})\xe2\u02c6\u2019\mathrm{\xcf\u0192}({S}_{z}\left)\right). Therefore in both cases, we have \mathrm{\xcf\u0192}\left(v\right({\mathcal{A}}_{{i}_{k}}\left)\right)+\mathrm{\xcf\u0192}\left({S}_{{i}_{j}}\right)\xe2\u2030\yen \mathrm{\xcf\u0192}\left(v\right({\mathcal{A}}_{{i}_{k}}\xe2\u02c6\u2013\mathcal{S}\left)\right)+\mathrm{\xcf\u0192}({S}_{{i}_{j}}\xe2\u02c6\xaav(\mathcal{S}\left)\right). By induction, we have \mathrm{\xcf\u0192}\left(v\right({\mathcal{A}}_{{i}_{\left\mathcal{V}\right\xe2\u02c6\u20191}}\left)\right)+\mathrm{\xcf\u0192}\left({S}_{{i}_{\left\mathcal{V}\right}}\right)>\mathrm{\xcf\u0192}\left(v\right({\mathcal{A}}_{{i}_{\left\mathcal{V}\right\xe2\u02c6\u20191}}\xe2\u02c6\u2013\mathcal{S}\left)\right)+\mathrm{\xcf\u0192}({S}_{{i}_{\left\mathcal{V}\right}}\xe2\u02c6\xaav(\mathcal{S}\left)\right) for any \mathcal{S}\xe2\u0160\u2020{\mathcal{A}}_{{i}_{\left\mathcal{V}\right\xe2\u02c6\u20192}}. Therefore, the partition of each round i in AM2CP has the maximum objective function value among all the partitions separating the last two sets. Each time we compare with {\mathcal{P}}_{max} and merge the last two sets. Therefore {\mathcal{P}}_{max} is an optimal partition for the M2CP problem when the influence propagation function Ïƒ(Â·) is supermodular.
Since AM2CP is an optimal solution if Ïƒ(Â·) is supermodular, we are interested in the influence propagation models in which the influence propagation function Ïƒ(Â·) is supermodular. Note that Ïƒ(Â·), in this paper, is different from the influence function defined in [14]. In this paper Ïƒ(X) is the sum of influence propagation within X for every node in X, i.e., \mathrm{\xcf\u0192}\left(X\right)=\underset{i\xe2\u02c6\u02c6X}{\xe2\u02c6\u2018}{\mathrm{\xcf\u0192}}_{X}\left(i\right). In [14]Ïƒ(X) is the influence propagation of seed set X in the entire social network. We show the following lemma.
Lemma 1
When the influence propagation model is LT, for any two communities: BâŠ‚A, and a node uâˆ‰A, we have Ïƒ(Aâˆª{u})âˆ’Ïƒ(A)â‰¥Ïƒ(Bâˆª{u})âˆ’Ïƒ(B).
Proof.
The influence propagation in the LT model, as shown in [14], can be simulated as a random process by flipping coins. Assume we have flipped all the coins in advance, then an edge is declared to be â€˜liveâ€™ if the coin flip indicated an influence will be propagated successfully and it is declared blocked otherwise. A node j is influenced by a seed i if and only if there is a path of live edges from i to j. According to this principle, any simple path from i to j has a certain probability to be a live path. In [15], Chen et al. prove that for any node i, the influence propagation of i is equal to \underset{\text{sp}\xe2\u02c6\u02c6\text{SP}\left(i\right)}{\xe2\u02c6\u2018}w\left(\text{sp}\right) where SP (i) is the set of all the simple paths starting from i and w(sp) is the probability that sp is a live path. Therefore, for a community X and a node iâˆˆX, {\mathrm{\xcf\u0192}}_{X}\left(i\right)=\underset{\text{sp}\xe2\u02c6\u02c6{\text{SP}}_{X}\left(i\right)}{\xe2\u02c6\u2018}w\left(\text{sp}\right) where SP _{ X }(i) is the set of simple paths starting from i in community X, and \mathrm{\xcf\u0192}\left(X\right)=\underset{i\xe2\u02c6\u02c6X}{\xe2\u02c6\u2018}{\mathrm{\xcf\u0192}}_{X}\left(i\right) is the sum of probabilities for all the simple paths in X. Since for any two communities, BâŠ‚A, the set of simple paths in B is a subset of the set of simple paths in A, we have Ïƒ(A)â‰¥Ïƒ(B). Similarly, we have Ïƒ(Aâˆª{u})âˆ’Ïƒ(A)â‰¥Ïƒ(Bâˆª{u})âˆ’Ïƒ(B) because Ïƒ(Aâˆª{u})âˆ’Ïƒ(A) is the sum of probabilities of simple paths visit u exactly once in community (Aâˆª{u}), and Ïƒ(Bâˆª{u})âˆ’Ïƒ(B) is the sum of probabilities of simple paths visit u exactly once in community (Bâˆª{u}) which is a subset of the former. Therefore, the influence propagation function Ïƒ(Â·) in the LT model is supermodular.
Theorem 2.
AM2CP is an optimal solution for M2CP in the LT model.
Proof.
The theorem follows directly from Theorem 1 and Lemma 1.
By Lemma 1, we show that Ïƒ(Â·) is supermodular in the LT model. We next show that Ïƒ(Â·) in the IC model, however, is not supermodular. The description of IC model can be found in detail in [14]. Here we just give a counterexample. As an example shown in Figure 3, the weights are as follows: w(1,2)=w(1,3)=w(1,4)=1 and w(2,5)=w(3,5)=w(4,5)=0.5. According to the edges in Figure 3, nodes 2, 3, and 4 cannot influence each other and nodes 2, 3, 4, and 5 cannot influence node 1. Let community A={1,2,3,5} and community B={1,2,5}. So B is a subset of A. By direct computing, we have Ïƒ(Aâˆª{4})âˆ’Ïƒ(A)=5.375âˆ’3.75=1.625 and Ïƒ(Bâˆª{4})âˆ’Ïƒ(B)=3.75âˆ’2=1.75. Therefore, Ïƒ(Aâˆª{4})âˆ’Ïƒ(A)<Ïƒ(Bâˆª{4})âˆ’Ïƒ(B) which implies Ïƒ(Â·) is not supermodular in the IC model.
3.2 Hardness
In this subsection, we study the hardness of MK CP. We show that the MK CP problem, with arbitrary K, is \mathcal{N}\mathcal{P}hard in the LT or IC model.
Theorem 3.
The MK CP problem is \mathcal{N}\mathcal{P}hard in the LT model for general K.
Proof.
To prove Theorem 3, we do a polynomial time reduction from the Minimum KCut problem. The input of Minimum KCut is a simple graph G(V,E) without directions and an integer M. The objective is to find a set of at most M edges which when deleted, separate the graph into exactly K nonempty components. It is well known that the Minimum KCut problem is \mathcal{N}\mathcal{P}hard for general K.
Given a graph G(V,E) for the Minimum KCut problem, we construct a social network G^{â€²}(V^{â€²},E^{â€²}) as follows: (1) For each node iâˆˆV, create a node i^{â€²} in V^{â€²}. (2) For each edge (i,j)âˆˆE, create two edges (i^{â€²},j^{â€²}) and (j^{â€²},i^{â€²}) in E^{â€²}. (3) Let Î” denote the maximum degree in G and n denote the number of nodes in G. Assign weight w({i}^{\xe2\u20ac\xb2},{j}^{\xe2\u20ac\xb2})=\frac{1}{{\left(\mathrm{n\xce\u201d}\right)}^{2}} for all the edges (i^{â€²},j^{â€²})âˆˆE^{â€²}.
It is clear that the reduction can be done in polynomial time. We next show that there is a KCut with M edges if and only if there is a KVDP with f\left(\mathcal{P}\right)\xe2\u2030\yen \frac{2\left(\rightE\xe2\u02c6\u2019M)}{{\left(\mathrm{n\xce\u201d}\right)}^{2}}. Assume there is a KCut with M edges, then graph G can be partitioned into K communities with Eâˆ’M edges within the K communities. Consider the same partition in G^{â€²}. The onehop influence propagation is \frac{2\left(\rightE\xe2\u02c6\u2019M)}{{\left(\mathrm{n\xce\u201d}\right)}^{2}}. Therefore, we have a KVDP with f\left(\mathcal{P}\right)>\frac{2\left(\rightE\xe2\u02c6\u2019M)}{{\left(\mathrm{n\xce\u201d}\right)}^{2}} for G^{â€²}. Conversely, assume there is a KVDP for G^{â€²} with f\left(\mathcal{P}\right)\xe2\u2030\yen \frac{2\left(\rightE\xe2\u02c6\u2019M)}{{\left(\mathrm{n\xce\u201d}\right)}^{2}}. It has been shown in [16] that for any nodes âˆ€i,j,lâˆˆV, the probability of influence propagation from i to j via node l is equal to w(i,l)w(l,j) in the LT model. Therefore, a single twohop influence propagation is \frac{1}{{\left(\mathrm{n\xce\u201d}\right)}^{4}}. The number of twohop simple paths for any node i^{â€²}âˆˆV^{â€²} is no more than Î”^{2}. Therefore, the sum of twohop influence propagation for every node in V^{â€²} is no more than \frac{n{\mathrm{\xce\u201d}}^{2}}{{\left(\mathrm{n\xce\u201d}\right)}^{4}}=\frac{1}{{n}^{3}{\mathrm{\xce\u201d}}^{2}}. By direct computing, we have the sum of (r+1)hop influence propagation is less than the sum of rhop influence propagation for any node i. Since the length of simple paths is no more than n, we have the sum of multihop influence propagation for every node in V^{â€²} is less than \frac{1}{{\left(\mathrm{n\xce\u201d}\right)}^{2}}. This implies that f\left(\mathcal{P}\right)\xe2\u2030\yen \frac{2\left(\rightE\xe2\u02c6\u2019M)}{{\left(\mathrm{n\xce\u201d}\right)}^{2}} if and only if the onehop influence propagation is no less than \frac{2\left(\rightE\xe2\u02c6\u2019M)}{{\left(\mathrm{n\xce\u201d}\right)}^{2}}. Therefore, the same partition in G is a KCut with at most M edges. In sum, we prove Theorem 3.
Theorem 4.
The MK CP problem is \mathcal{N}\mathcal{P}hard in the IC model for general K.
Proof.
To prove Theorem 4, we can do the same reduction as the one in the proof of Theorem 3, i.e., assign uniform weight \frac{1}{{\left(\mathrm{n\xce\u201d}\right)}^{2}} on all the edges. It can be shown by induction that the sum of (r+1)hop influence propagation a node i received is less than the sum of rhop influence propagation it received for any node iâˆˆV^{â€²} in the IC model. Therefore, by a similar argument, we have the sum of multihop influence propagation received for every node iâˆˆV^{â€²} is less than the edge weight. Therefore, there exists a KCut with M edges if and only if there is a KVDP with f\left(\mathcal{P}\right)\xe2\u2030\yen \frac{2\left(\rightE\xe2\u02c6\u2019M)}{{\left(\mathrm{n\xce\u201d}\right)}^{2}}.
The proofs of Theorems 3 and 4 are nothing but assign specific weights to make the multihop influence propagation negligible. It is intuitive that the general MK CP problem is even harder when multihop influence propagation is not negligible.
3.3 Heuristic algorithm for MK CP
In this subsection, we present two heuristic algorithms for MK CP. As mentioned in â€˜Related workâ€™ section in the literature, there are mainly four categories of methods for community partition: hierarchybased methods, spectrumbased methods, densitybased methods, and modularitybased methods. In our point of view, spectrumbased methods, densitybased methods, and modularitybased methods are not suitable for MK CP. In spectrumbased methods, communities are partitioned by studying the adjacency matrix which cannot reflect the information of influence propagation. In densitybased methods, communities are defined as areas of higher density than the remainder of the data set. Therefore, this category of methods requires the location knowledge of nodes which cannot be formulated in our MK CP problem. In modularitybased methods, the objective of community partition is only to maximize the global modularity score. Therefore, all the three categories of methods cannot be applied for MK CP and we focus on hierarchybased methods.
Generally speaking, hierarchical community partition is a method to build a hierarchy of communities. There are two strategies for hierarchical partition. One is split and the other is merge. Split is a top down approach, i.e., all the nodes start within one community, and splits are performed on one of the communities recursively. Conversely, merge is a bottom up approach, i.e., each node starts in a distinct community, and pairs of communities are merged recursively as a new community. For typical hierarchical community partition problems, nâˆ’1 splits (or respectively merges) have to be done to build a hierarchy where n is the number of nodes. But for the MK CP problem, we need only Kâˆ’1 splits or nâˆ’K merges respectively to obtain a KVDP. We will determine the splits and merges in a greedy manner. The Split algorithm runs by calling AM2CP recursively, and each time it partitions a community X into two communities X_{1} and X_{2} with the minimum value of Ïƒ(X)âˆ’(Ïƒ(X_{1})+Ïƒ(X_{2})). The pseudocode is given in Algorithm 2. The Merge algorithm runs by randomly selecting a community X each time and finding another community Y to maximize the value of Ïƒ(XâˆªY)âˆ’(Ïƒ(X)+Ïƒ(Y)). The pseudocode is given in Algorithm 3.
In the general case, the running time of a split with an exhaustive search requires exponential time. However, when Ïƒ(Â·) is supermodular, we can apply AM2CP to determine {C}_{{z}_{1}} and {C}_{{z}_{2}} for each C_{ z } which requires only \mathcal{O}\left(\right{C}_{z}{}^{3}\mathrm{\xcf\u201e}) time. Now let us consider the computational complexity of SAMK CP (Algorithm 2). To avoid duplicate computations, we can keep the optimal partition for each community in and apply AM2CP on both {C}_{{z}_{1}} and {C}_{{z}_{2}} at step 4 to obtain their optimal partitions. Then the overall running time of SAMK CP is \mathcal{O}\left(K{n}^{3}\mathrm{\xcf\u201e}\right) when Ïƒ(Â·) is supermodular.
In step 4 of MAMK CP (Algorithm 3), in order to maximize the marginal gain, we have to compute Ïƒ(C_{ i }âˆªC_{ j }) for all the communities {C}_{j}\xe2\u02c6\u02c6\mathcal{P}, thus, MAMK CP requires \mathcal{O}\left({n}^{2}\mathrm{\xcf\u201e}\right) time to obtain a KVDP when n is large and K is small. The computational complexity of SAMK CP is even higher. Therefore, they may be not suitable for large social networks. To improve the running time performance, here we provide an alternative merge strategy for implementing MAMK CP. Instead of merging the communities with the maximum marginal gain, in step 4 we estimate the influence propagation of C_{ i } through the entire graph, i.e., Ïƒ_{ V }(C_{ i }), and then compute the average influence received by C_{ j } from C_{ i }, which is defined as \frac{\underset{l\xe2\u02c6\u02c6{C}_{i}}{\xe2\u02c6\u2018}\underset{r\xe2\u02c6\u02c6{C}_{j}}{\xe2\u02c6\u2018}{p}_{V}(l,r)}{\left{C}_{j}\right}, for all the communities C_{ j }â‰ C_{ i }. This can be done by simply accumulating p_{ V }(l,r) for each community C_{ j } when we computing Ïƒ_{ V }(C_{ i }). Finally, we merge C_{ i } with a community with the highest average received influence. In such a way, a merge can be done in O(Ï„) time. The overall running time of MAMK CP is only \mathcal{O}\left(\mathrm{n\xcf\u201e}\right).
According to the complexity analysis, MAMK CP is better than SAMK CP in terms of the running time performance. For some large social networks, we can apply the simplified version of MAMK CP which requires only linear time. In terms of the partition quality, intuitively, SAMK CP is better than MAMK CP because it considers the global optimization (topdown approach) each time and MAMK CP considers the local optimization (bottomup approach). We will demonstrate their performance through simulation in the next section.
4 Results and discussion
In this section, we carry out experiments over realworld social networks. The influence propagation is based on the wellknown LT and IC models, and we run MC simulation to estimate the influential propagation function Ïƒ(Â·). We begin by describing the algorithms, data sets, and experimental settings in â€˜Algorithm,â€™ â€˜Data set,â€™ and â€˜Experiment settingâ€™ sections, respectively, and then discuss the experimental results in â€˜Experiment resultâ€™ section.
4.1 Algorithm
In addition to the proposed algorithms, (SAMK CP, Algorithm 2) and (MAMK CP, Algorithm 3), we also implement two classic community partition algorithms for comparison purposes. One is a Modularitybased Algorithm (MODUA) proposed in [47] and the other is a Spectrumbased Algorithm (SPECA) proposed in [48]. Given a graph G, MODUA finds communities by optimizing the modularity score locally and it terminates until a maximal modularity score is obtained. Therefore, MODUA cannot partition G into a given number K of communities. While SPECA is flexible for the number K of communities, it partitions a graph iteratively into K communities by minimizing the general cut each time according to the adjacent matrix. To the best of our knowledge, we do not find any algorithm which is designed for disjoint community partition with the objective of maximizing the influence propagation within each community. In addition, we do not find any densitybased algorithm that can be applied to our MK CP problem.
4.2 Data set
We conduct simulation on three realworld social networks as follow: (1) NetHEPT: taken from the coauthorship network in â€˜High Energy Physics (Theory)â€™ section (from 1991 to 2003) of arXiv (http://arXiv.org). The nodes in NetHEPT denote the authors, and the edges represent the coauthorship. HetHEPT has 15,229 nodes and 31,376 edges. (2) NetEmail: taken from the email interchange network in University of Rovira i Virgili (Tarragona). The nodes in NetEmail denote the members in the university, and the edges represent email interchanges among the members (the data set is available at http://deim.urv.cat/~alephsys/data.php). NetEmail has 1,133 nodes and 10,902 edges. (3) NetCLUB: taken from the relationship network in Zacharyâ€™s Karate club network, which is described by Wayne Zachary in [49]. NetCLUB has 34 nodes and 78 edges.
4.3 Experiment setting
In this study, we assume that the influential degree from nodes i to j depends on the closeness of their relationship and the probability p(i) for node i where p(i), as defined in Problem descriptionâ€™ section, is the probability that node i would produce an influence propagation or would share knowledge with others. We apply the method proposed in [14] to estimate the closeness c(i,j) between i and j. Let degin(j) denote the indegree of node j, then c(i,j)=e(i,j)/ degin(j), where e(i,j) denotes the number of edges from i to j. Due to the lack of ground truth, we independently assign uniform random 0.1%, 1%, and 10% to sharing probabilities p(i) for all the nodes i. Then we assume âˆ€(i,j)âˆˆE, i has a chance of w(i,j)=\frac{p\left(i\right)e(i,j)}{\underset{\text{in}}{deg}\left(j\right)} to influence j.
4.4 Experiment result
We first evaluate the performance of our algorithms on NetCLUB. In algorithm SAMK CP or MAMK CP, Ïƒ(Â·) is computed by running MC simulation 1,000 times and get the average. Although AM2CP is not an optimal solution in the IC model, we still apply it in the splits in the simulation of IC model to improve the computational efficiency. Since MODUA is not flexible for the number of communities, we first apply MODUA to get a partition of NetCLUB and then apply our algorithms and SPECA to partition NetCLUB into the same number of communities. Figures 4 and 5 show the experimental results for the LT and IC models respectively. NetCLUB is partitioned into four communities. In terms of influence propagation, both SAMK CP and MAMK CP are better than MODUA and SPECA. SAMK CP outperforms MODUA and SPECA by about 40% and 70% respectively. In addition, from Figures 4 and 5, we can see the influence propagation of each partition is increasing gradually and linearly when the times of simulation increase, which reflects the reliability of experimental results.
In the second experiment, we compare MAMK CP with MODUA and SPECA on NetEmail. SAMK CP is removed due to its high computational complexity. Figures 6 and 7 show the experimental results. The network is partitioned into 88 communities. MAMK CP has the maximum sum of influence propagation. The performance of SPECA is poor compared with MAMK CP and MODUA. The influence propagation within the partition of SPECA is about two times less than that of MAMK CP and about one time less than that of MODUA.
In the last experiment, we compare MAMK CP with MODUA and SPECA on NetHEPT. Since this network has 15,229 nodes and 31,376 edges, we use the simplified version of MAMK CP. Figures 8 and 9 show the experimental results. The network is partitioned into 1,820 communities. MAMK CP is still better than MODUA and SPECA, but the gap between MAMK CP and MODUA in this experiment is less than that in the second experiment. This agrees with our intuition in that simplified MAMK CP has a lower computational complexity but also has some loss in performance. According to the three experimental results, we can conclude that the proposed algorithms are better than modularitybased and spectrumbased methods for finding communities in terms of influence propagation.
5 Conclusions
Community partition and influence propagation are important problems in social networks. In this paper, we investigate the Maximum KCommunity Partition (MK CP) problem to maximize the sum of influence propagation within each community. We analyze the problem both theoretically and practically. Especially we show that the M2CP problem can be solved efficiently for a class of influence propagation models. In addition, we prove that the MK CP problem is \mathcal{N}\mathcal{P}hard in the wellknown LT and IC models for general K. We also develop two heuristic algorithms and demonstrate their efficiency through simulation on realworld social networks.
We believe this study is useful for the influence propagation problems. In future research, we plan to extend our work to the influence maximization problem to select the most influential nodes based on influencebased communities. Furthermore, we will study potential applications of influencebased communities in social networks.
References
Bollobas B: Modern Graph Theory. Springer Verlag, New York; 1998.
Girvan M, Newman MEJ: Community structure in social and biological networks. Proc. Natl. Acad. Sci 2002,99(12):7821â€“7826. 10.1073/pnas.122653799
Luxburg, U: A tutorial on spectral clustering. Stat. Comput. 17, 395â€“416 (2007).
Kannan R, Vempala S, Vetta A: On clusterings: good, bad and spectral. J. ACM 2004,51(3):497â€“515. 10.1145/990308.990313
Mancoridis, S, Mitchell, BS, Rorres, C: Using automatic clustering to produce highlevel system organizations of source code. In: Proceedings of the 6th International Workshop on Program Comprehension, Ischia, Italy, 24â€“26 June 1998, pp. 45â€“53 (1998).
Newman, M, Girvan, M: Finding and evaluating community structure in networks. Phys. Rev. E. 69, 026113 (2004).
White, S, Smyth, P: A spectral clustering approach to finding communities in graphs. In: SDMâ€™05: Proceedings of the 5th SIAM International Conference on Data Mining, pp. 76â€“84 (2005).
Newman M: Modularity and community structure in networks. Proc. Natl. Acad. Sci. USA 2006,103(23):8577â€“8582. 10.1073/pnas.0601602103
Brown, J, Reinegen, P: Social ties and wordofmouth referral behavior. J. Consum. Res. 14, 350â€“362 (1987).
Goldenberg J, Libai B, Muller E: Using complex systems analysis to advance marketing theory development: modeling heterogeneity effects on new product growth through stochastic cellular automata. Acad. Market. Sci. Rev 2001,9(3):1â€“18.
Goldenberg, J, Libai, B, Muller, E: Talk of the network: a complex systems look at the underlying process of wordofmouth. Market. Lett. 12, 211â€“223 (2001).
Richardson, M, Domingos, V: Mining knowledgesharing sites for viral marketing, Edmonton, Alberta, Canada, 23â€“26 July 2002, pp. 61â€“70. KDD (2002).
Domingos, P, Richardson, M: Mining the network value of customers, San Francisco, CA, USA, 26â€“29 August 2001, pp. 57â€“66. KDD (2001).
Kempe D, Kleinberg JM, Tardos E: Maximizing the spread of influence through a social network. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York; 2003.
Chen, W, Yuan, Zhang, L: Scalable influence maximization in social networks under the linear threshold model. In: Proceedings of the 10th IEEE International Conference on Data Mining, Sydney, Australia, 14â€“17 December 2010, pp. 88â€“97 (2010).
Chen W, Wang C, Wang Y: Scalable influence maximization for prevalent viral marketing in largescale social networks. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York; 2010.
Wang Y, Cong G, Song G, Xie K: Communitybased greedy algorithm for mining topk influential nodes in mobile social networks. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDDâ€™10). ACM, New York; 2010.
Radicchi F, Castellano C, Cecconi F, Loreto V, Parisi D: Defining and identifying communities in networks. Proc. Natl. Acad. Sci. USA 2004,101(9):2658â€“2663. 10.1073/pnas.0400054101
Hu, Y, Chen, H, Zhang, P, Zhang, P, Li, M, Di, Z, Fan, Y: Comparative definition of community and corresponding identifying algorithm. Phys. Rev. E. 78, 026121 (2008).
Weiss RS, Jacobson E: A method for the analysis of the structure of complex organizations. Am. Sociol. Rev 1955,20(6):661â€“668. 10.2307/2088670
Boettcher, S, Percus, AG: Extremal optimization for graph partitioning. Phys. Rev. E. 64, 026114 (2001).
Clauset A, Newman MEJ, Moore C: Finding community structure in very large networks. Phys. Rev. E 2004,70(6):066111. 10.1103/PhysRevE.70.066111
Newman, MEJ: Fast algorithm for detecting community structure in networks. Phys. Rev. E. 69, 066133 (2004).
Wakita K, Tsurumi T: Finding community structure in megascale social networks. In Proceedings of the 16th International Conference on World Wide Web, WWWâ€™07. ACM, New York; 2007.
Guimera R, Pardo MS: LAN: modularity from fluctuations in random graphs and complex networks. Phys. Rev. E 2004,70(2):025101. 10.1103/PhysRevE.70.025101
Massen, CP, Doye, JPK: Identifying communities within energy landscapes. 71, 046101 (2005).
Duch J, Arenas A: Community detection in complex networks using extremal optimization. Phys. Rev. E 2005,72(2):027104. 10.1103/PhysRevE.72.027104
Holland JH: Adaptation in Natural and Artificial Systems. MIT, Cambridge; 1992.
Pizzuti C: Community detection in social networks with genetic algorithms. In Proceedings of the 10th Annual Conference on Genetic and Evolutionary Computation, GECCOâ€™08. ACM, New York; 2008.
Fortunato S, Barthelemy M: Resolution limit in community detection. Proc. Natl. Acad. Sci. USA 2007,104(1):36â€“41. 10.1073/pnas.0605965104
Zhang X, Li Z, Wang R, Wang Y: A combinatorial model and algorithm for globally searching community structure in complex networks. J. Combin. Optim 2010,23(4):425â€“442. 10.1007/s1087801093560
Fortunato, S: Community detection in graphs. Phys. Rep. 486, 75â€“174 (2010).
Gaertler, M: Clustering. In: Brandes, U, Erlebach, T (eds.) Network Analysis: Methodological Foundations, pp. 178â€“215. Springer (2005).
Lancichinetti, A, Fortunato, S: Community detection algorithms: a comparative analysis. Phys. Rev. E. 80, 056117 (2009).
Schaeffer S: Graph clustering. Comput. Sci. Rev 2007,1(1):27â€“64. 10.1016/j.cosrev.2007.05.001
Andersen, R, Chung, F, Lang, K: Local graph partitioning using PageRank vectors. In: Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science, Berkeley, CA, USA, 21â€“24 October 2006, pp. 475â€“486 (2006).
Leicht EA, Newman MEJ: Community structure in directed networks. Phys. Rev. Lett 2008,100(11):118703. 10.1103/PhysRevLett.100.118703
Leskovec, J, Krause, A, Guestrin, C, Faloutsos, C, VanBriesen, J, Glance, N: Costeffective outbreak detection in networks. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, CA, USA, 12â€“15 August 2007, pp. 420â€“429 (2007).
Kimura, M, Saito, K: Tractable models for information diffusion in social networks, pp. 259â€“271, PKDD (2006).
Kimura, M, Saito, K, Motoda, H: Efficient estimation of influence functions for SIS model on social networks. In: Proceedings of the 21st International Joint Conference on Artificial Intelligence, Pasadena, CA, USA, 11â€“17 July 2009, pp. 2046â€“2051 (2009).
Li, H, Bhowmick, S, Sun, A: CINEMA: conformityaware greedy algorithm for influence maximization in online social networks, pp. 323â€“334. EDBT (2013).
Galstyan A, Musoyan V, Cohen P: Maximizing influence propagation in networks with community structure. Phys. Rev. E 2009,79(5):056102. 10.1103/PhysRevE.79.056102
Nguyen, NP, Yan, G, Thai, MT, Eidenbenz, S: Containment of misinformation spread in online social networks. WebSci, pp. 213â€“222 (2012).
Dinh, TN, Xuan, Y, Thai, MT: Towards socialaware routing in dynamic communication networks. IPCCC, pp. 161â€“168 (2009).
Belak V, Lam S, Hayes C: Targeting online communities to maximise information diffusion. In Proceedings of the WWW Workshop on Mining Social Networks Dynamics. Lyon, France; 2012.
Stoer M, Wagner F: A simple mincut algorithm. J. ACM 1997,44(4):585â€“591. 10.1145/263867.263872
Blondel, V, Guillaume, J, Lambiotte, R, Lefebvre, E: Fast unfolding of communities in large networks. J. Stat. Mech. Theor. Exp (2008).
Dhillon, I, Guan, Y, Kulis, B: A fast kernelbased multilevel algorithm for graph clustering. In: Proceedings of The 11th ACM SIGKDD, Chicago, Illinois, USA, 21â€“24 August 2005, pp. 629â€“634 (2005).
Zachary, W: An information flow model for conflict and fission in small groups. J. Anthrop. Res. 33, 452â€“73 (1977).
Acknowledgements
This research work is supported in part by National Science Foundation of USA under grants NSF 1137732 and NSF 1241626.
Author information
Authors and Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interests.
Authorsâ€™ contributions
ZL and YZ formulated the problem and did the algorithm design and implementation. WL, WW, and XC contributed to the theoretical part of algorithm design and organized this research. All authors read and approved the final manuscript.
Authorsâ€™ original submitted files for images
Below are the links to the authorsâ€™ original submitted files for images.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0), which permits use, duplication, adaptation, distribution, and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Lu, Z., Zhu, Y., Li, W. et al. Influencebased community partition for social networks. Compu Social Networls 1, 1 (2014). https://doi.org/10.1186/s4064901400014
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s4064901400014