 Research
 Open
 Published:
Longrange degree correlations in complex networks
Computational Social Networksvolume 2, Article number: 4 (2015)
Abstract
Social networks are often degree correlated between nearest neighbors, an effect termed homophily, wherein individuals connect to nearest neighbors of similar connectivity. Whether friendships or other associations are so correlated beyond the firstneighbors, and whether such correlations are an inherent property of the network or are dependent on other specifics of social interactions, remains unclear. Here we address these problems by examining longrange degree correlations in three undirected online social and three undirected nonsocial (airport, transcriptionalregulatory) networks. Degree correlations were measured using Pearson correlation scores and by calculating the average neighbor degrees for nodes separated by up to 5 sequential links. We found that the online social networks exhibited primarily weak anticorrelation at the firstneighbor level, and tended more strongly towards disassortativity as separation distances increased. In contrast, the nonsocial networks were disassortative among firstneighbors, but exhibited assortativity at longer separation distances. In addition, the average degrees of the separated neighbors approached the average network connectivity after approximately 34 steps. Finally, we observed that two algorithms designed to grow networks on a nodebynode basis failed to reproduce all the correlative features representative of the social networks studied here.
Introduction
A complex network is said to be degree correlated if the degrees of nodes at the end of links occur together in a nonrandom manner. The tendency of nodes to connect with others of similar degree is termed assortativity [1], or homophily when referenced specifically to social networks [2]. Conversely, nodes connected to others of dissimilar degree are said to be disassortatively mixed [1]. In a social network the nodes represent individuals, and the links between them conceptualize friendships or other social associations. In this setting, an assortative network emphasizes the surprising result that “...your friends have more friends than you do” [3]. Although results from the literature mostly involve degree mixing among nearest neighbors, little else has been reported regarding degree correlations extending beyond the first neighbors. Can a degree “correlation length” be defined for complex networks? If so, how far do degree correlations extend into a network based from any node?
Some insight into these questions has come from the social sciences. For example, the probability that an individual (termed the “ego”) and his/her acquaintance (termed the “alter”) are jointly obese decreases with geodesic distance (i.e., the number of sequential steps that link two nodes). However, this assortative effect is nearly independent of geographic distance [4], and is therefore a network property. Similar results hold for other healthrelated outcomes, such as smoking [5]. For a wide variety of social outcomes, such as happiness, divorce, depression, sleep length, marijuana use, Christakis and Fowler [6] reported assortative correlations to up to 4 and 6 steps from the ego.
Several explanations were proposed to explain these nonrandom effects [6]. For example, individuals could choose to associate with others of similar traits (homophily); individuals could associate with others exposed to similar environments; or traits could spread their influence through “conduction,” like a contagion. However, such hypotheses cannot explain all assortative correlations beyond the first step (nearest neighbors), because similar effects have been observed in networks of more “autonomous” agents, such as food webs [7] (see also the commentary in [8]). Nevertheless, Christakis and Fowler conclude that, for social networks, traits extend to, on average, 3 steps beyond the ego [6]. Because node degrees are an elementary feature of networks and tend to correlate assortatively, we could ask: Does this 3step observation hold for degreedegree correlations in general, across many different types of networks? If so, is there a defining mechanism for the effect?
Here we address these questions by comparing degree correlations for several large social networks to exemplary nonsocial ones, including an airline transportation network and two wellannotated transcriptional regulatory networks. We developed and executed an algorithm to evaluate degree correlations between nodes separated by more than one step, which is general enough to be applicable to nearly any undirected network. These methods could also be used to evaluate correlations between properties or features of the nodes beyond those associated directly with the network topology.
Methods
2.1 Measuring degree correlations
The degree of a node measures the number of links to its nearest neighbors, and is a critical property of the network topology because it accounts for coupling of each node to the greater network. It is therefore of great interest to examine the distribution of and correlation in network degrees. For simplicity, we will assume that all edges of the networks we examine are undirected (Section 2.3, below).
2.1.1 Average neighbor degree
The quantity 〈k _{1} k _{0}〉 may be calculated, which is the correlation function between k _{0}, the degree of the “ego” or focal node, and k _{1}, the degree of the “alter” node connected to the ego by 1 link (Figure 1). This twopoint correlation function can be expressed as: $\langle k_{1} k_{0} \rangle = \sum _{k_{0}k_{1}} k_{1} k_{0} p{\left (k_{1}, k_{0} \right)}$ , wherein p(k _{1},k _{0}) is the joint probability that nodes of degree k _{1} and k _{0} appear together at the ends of a link [1]. Here, the sum spans k _{1},k _{0}=1,…,L, with L the number of links in the network. The joint probability can be expressed as p(k _{1},k _{0})=p(k _{1}k _{0})p(k _{0}), so that the correlation between degrees is contained within the conditional probability, p(k _{1}k _{0}).
One way to determine whether any degree correlation exists is to measure the average nearest neighbor degree as a function of node degree, ${\langle k_{1}^{\text {nn}} \rangle }{\left (k_{0} \right)}$ . This quantity is directly linked to the conditional probability [9]:
If the conditional probability is uncorrelated, then p(k _{1}k _{0})=p(k _{1}), and Eq. 1 can be evaluated to give ${\langle k_{1}^{\text {nn}} \rangle }(k_{0}) = {\langle {k_{0}^{2}} \rangle } / {\langle k_{0} \rangle }$ , which is a constant of the network. Here, ${\langle k_{1}^{\text {nn}} \rangle } > {\langle k_{0} \rangle }$ for nonzero variance, which quantifies the notion that “...your friends have more friends than you do” [3]. Thus, any observed dependence of ${\langle k_{1}^{\text {nn}} \rangle }$ on k _{0} indicates the presence of degree correlations in the network.
Do degree correlations extend beyond direct neighbors? To address this question, we extend Eq. 1 to nodes separated by long chains of mmany links. By “long chains”, we mean the number of links separating one node (the ego) from another (the alter) that can be reached by following successive links, generationbygeneration, out from the ego without any backtracking. We will refer to these paths as “mchains”, which constitute the shortest paths between two nodes of the network, and are identical to the geodesic distance between ego and alter nodes [6]. The basic idea is conceptualized in Figure 1, wherein m denotes the number of sequential links that compose the path between the ego and alter.
We label the joint probability that a node of degree k _{0} is connected by an mchain to another of degree k _{ m } by p _{ m }(k _{ m },k _{0}). For m=1, this quantity represents the one described in the previous section for nearest neighbors, and we drop the subscript: p _{1}(k _{1},k _{0})=p(k _{1},k _{0}). In a similar way that Eq. 1 links the average nearestneighbor degree to the node degree, we have:
Thus, any dependence of ${\langle k^{\text {nn}}_{m} \rangle }$ on k _{0} signifies degree correlations between nodes at the ends of an mchain.
2.1.2 Pearson correlation
The Pearson correlation, r, is the ratio of the covariance of fluctuations, 〈(k _{0}−〈k _{0}〉)(k _{ m }−〈k _{ m }〉)〉, to the variance in degree, ${\langle {k_{0}^{2}} \rangle } {\langle k_{0} \rangle }^{2}$ [1]:
wherein the average, 〈⋯ 〉, is taken over the nodes connected by mchains. For practical purposes, Eq. 3 can be implemented as a sum over mchains:
wherein L _{ m } is the number of mchains. Note that L _{1}=L is the number of links in the network, and that L _{ m }≥L.
The Pearson correlation is often used to measure the assortativity of nodes connected by links (i.e., m=1), because the variance in k _{0} is equal to the value of 〈k _{0} k _{ m }〉−〈k _{0}〉〈k _{ m }〉 for a maximally assortative network [1]. Therefore, r is bounded on [−1,1]; r=−1 corresponds to a purely disassortative network, while r=1 marks a network as purely assortative. However, this measure obscures the k _{0}dependence of p _{ m }(k _{ m }k _{0}) [10].
2.2 Algorithm to identify mchain neighbors
We evaluated Eqs. 1 and 2 using a computational algorithm to determine nodes connected by mchains, for m=1,…,5, which is conceptualized in Figure 2. We chose a maximum geodesic distance of m=5 to balance computational resources with the reports that such correlations nearly vanish for m>3 [6]. Referring to Figure 2 with the understanding that the networks are undirected, the steps of the algorithm can be outlined as follows.

I.
Choose an ego node (node 0, Figure 2(a));

II.
Follow all links from the ego to its neighboring nodes, and append the IDs of these neighbor nodes into one of five dynamic lists, one list for each geodesic distance, m;

III.
Now, follow all links from these neighbor nodes to their neighbors (e.g., from node generations 1 to 2 in Figure 2(b)), excluding nodes already identified in a list from a previous generation/geodesic distance;

IV.
Continue this process until m=5, then return to step I.
We followed a method outlined in Ref. [11] to evaluate Eqs. 1 and 2, but using nodes connected by mchains obtained from the above algorithm. For each node i with degree ${k_{0}^{i}}$ in the network (the ego), we identified a set of neighbor nodes found at the end of each mchain, ${n_{m}^{j}}$ (the alters), each of which has degree ${k_{m}^{j}}$ . To each ego node, we associated an averaged mchain neighbor degree: ${\langle k_{m} \rangle }{\left ({k_{0}^{i}} \right)}={\left \left \{ {n_{m}^{j}}\right \}\right }^{1} \sum _{\{ {n_{m}^{j}} \}}{{k_{m}^{j}}}$ . Finally, we took the arithmetic average of all instances of a given degree value, ${k_{0}^{i}}=k$ , to give [11]: $\langle k_{m} \rangle = {\{ \langle k_{m} \rangle \}}^{1} \sum _{{k_{0}^{i}}=k}{\langle k_{m} \rangle }{\left (k^{i} \right)}$ .
2.3 Network datasets
We studied several online social network datasets, and compared their results to those obtained from a transportation network and two transcriptional regulatory networks. By social network, we mean a network wherein the nodes represent individuals and the links between them signify social associations. All of these networks were manifestly directed, but for simplicity we studied them as undirected networks by examining their total degree, which is the sum of in and outdegrees for each node, and ignoring linkdirections. Although many nodes could therefore support multiples links, we found that all of the considered networks, both social and nonsocial, closely followed a “scalefree” degree distribution, p(k)∝k ^{−γ} (k _{0}=k for notational convenience), as shown in Figure 3.
2.3.1 Online social networks
We evaluated a dataset from the Advogato online social network, wherein users can express the level of “trust” between themselves and another [12]. As mentioned above, we are only interested in the structure of the links between all individuals, and therefore ignored any weights assigned to them. The Advogato network is composed of 3,302 nodes/users linked together by 32,954 links.
A snapshot of the decentralized Gnutella peertopeer filesharing network was captured on 6 August, 2002 [13]. In this dataset, the 8,717 nodes represent the hosts, and each of the 31,525 links signify connections established between them.
The WikiVote network was derived from a complete dump of the Wikipedia pageedit history (3 January, 2008) [14,15]. Wikipedia users may be promoted to administrators, who enjoy additional technical and maintenance capabilities of the website, which requires a public vote among its users. In this network, all 8,297 nodes represent individual users, and each of the 103,689 links indicate that one person voted for the other.
2.3.2 Nonsocial networks
As examples of nonsocial networks, we chose a physical transportation network, labeled “Airports”. This network maps flights scheduled between the 500 busiest airports in the United States (US) in 2002 [16]. In this dataset, a node represents one of 500 US airports, while each of its 2980 links denote whether a flight was scheduled from one airport to another. While this network is manifestly undirected, it is weighted. We therefore ignored the weights in favor of the network topology alone.
We compared this transportation network with two transcriptional regulatory networks, which relate the expression of genes (nodes) that interact by producing proteins, termed transcription factors, that may alter the expression level of other genes. We employed two experimentally validated datasets from the literature, obtained using the GeneNetWeaver software package [17]; one for the model bacterium Escherichia coli (E. coli), and the common baker’s yeast Saccharomyces cerevisiae (S. cerevisiae). The E. coli network consisted of 1565 nodes and 3758 (directed) links, whereas the S. cerevisiae network supported 4441 nodes and 12873 links. While the degree distribution of these networks generally follows a powerlaw (Figure 3), its structure differs substantially from a social network in that it is primarily hierarchical [18], with a few apical “master regulator” proteins that control the expression of a great many genes.
Results
3.1 Aboveaverage mchain neighbor degrees in social networks
Figure 4 shows the average degrees of nodes found at the end of all mchains, 〈k _{ m }〉, independent of the starting point. The longdistance behavior of this metric should be intuitive: as we move stepbystep away from a node, the average degree of nodes found at the end of the chain should approach the average connectivity of the graph, 〈k _{0}〉 (dotted lines, Figure 4). To estimate 〈k _{ m }〉, we observed that mchain degree neighbor distributions appeared lognormal, from which we estimated the mean (circles) and standard deviation (error bars); however, the degree distribution of the nodes themselves, 〈k _{0}〉, were powerlaw distributed (Figure 3).
For the nonsocial networks (bottom row of Figure 4), the condition 〈k _{ m }〉=〈k _{0}〉 occurs at approximately m=3 or m=4, while for the social networks we find m≥4. Additionally, the quantity 〈k _{ m }〉 (m>0) remains elevated over the identical geodesic length of the nonsocial networks. In other words, not only do your friends have more friends than you do, but so do your friends’ friends’ friends’ friends.
One potential explanation for this effect may come from the tendency for social networks to form larger clumps of highlyconnected nodes that, together, are only sparsely connected [19]. If nodes that are connected through mchains can often be found within a highlyconnected community, or if a node within a community can be easily reached through an mchain, then 〈k _{ m }〉 will be biased toward the connectivity of the community.
3.2 Assortative mixing beyond the nearest neighbors in social networks
Figure 5 illustrates how the average mchain neighbor degree, 〈k _{ m }〉, varies with ego degree, k, for the three social networks; Figure 6 illustrates this relationship for the three additional nonsocial networks. It has been noted previously that some networks exhibit nonmonotone degree correlation [10], with a crossover point near k=10, which has been observed before in models of random networks [20]. We therefore used a powerlaw function, 〈k _{ m }〉(k)∼k ^{γ}, wherein k _{0}=k labels ego degrees, to empirically model the tail of the mchain neighbor degrees. This feature is not clearly present in the nonsocial networks; so, we fit a powerlaw function across the whole domain of its degree.
We can make several observations by comparing results from the social networks (Figure 5) to results from nonsocial networks (Figure 6). First, as geodesic distance increases, 〈k _{ m }〉 for all social networks exhibits disassortative tendency. Park and Newman have argued [21] that social networks are different from other networks in that they are substantially assortative in nearest neighbor degree correlations. While 〈k _{1}〉 for the social networks of Figure 5 exhibit nearly flat correlation, the nonsocial networks of Figure 6 appear disassortative in 〈k _{1}〉. In light of the argument made by Park and Newman [21], the nearly flat behavior of 〈k _{1}〉 seen in Figure 5 could result from positive correlative trends.
Another observation we can make by comparing Figures 5 and 6, is that the nonsocial networks, specifically the transcriptional networks, exhibit opposite correlations between 1 and 2chain neighbors, 〈k _{1}〉 and 〈k _{2}〉, respectively. Additionally, the extended correlations (m>2) in the nonsocial networks are consistently positive (Figures 6 and Tables 1 and 2), which should be contrasted against the consistently disassortative correlations (γ<0 and r<0) exhibited by the social networks (Figure 5).
3.3 Network growth models cannot fully explain longrange social network correlations
Do longrange disassortative correlations observed for social networks in Figure 5, occur in networks created using random mechanisms? To address this problem, we used two nodebynode networkgrowing algorithms. The first is a modified version of the wellknown BarabàsiAlbert model [22] which reproduces scalefree degree distributions. Networks grown using this algorithm are known to generate degree correlations at the nearestneighbor level due to the preference of older nodes to acquire more links [23]. We have implemented this model with the addition of incorporating a selection method that allows for a variable number of links to be drawn at each attachment step. More specifically, we choose to attach lmany links at each attachment step by rounding N P(x≤k) up to the nearest whole number l, wherein P(x≤k) is the cumulative degree distribution and N is the sumtotal of nodes, both evaluated at the current attachment step. Because all random networks are grown on a nodebynode basis, wherein the number of links are determined by the stepwise attachment algorithm, we “grew” each network to the size of a chosen representative social network, the Advogato network, which hosts a total of 3302 nodes.
The other nodebynode attachment mechanism was reported by Vàzquez [20], and termed the “random walk” model. Here, nodes are attached following the linear attachment kernel of the BarabàsiAlbert model as stated above, but an additional step is added: a neighbor node is chosen at random with uniform probability, and with probability q _{ e }, a link is drawn from the candidate node (the one just attached) to the neighbor node. If this secondary link attachment is successful, then this “random walk” procedure continues until the check of each new q _{ e } fails. A primary feature of networks grown using this model is that their degree correlations are “mixed”; that is, lowerdegree nodes exhibit positive correlations, while the higher degree nodes exhibit disassortative tendencies. We have previously observed such behavior in a wide variety of directed, realworld networks [10], but this behavior can also be seen in the behavior of 〈k _{ m }〉(k) for the social networks illustrated in Figure 5.
Figure 7 shows how the altered version of the BarabàsiAlbert model performs in terms of Pearson correlation scores (box plots), compared to a representative social network, the Advogato online social network (asterisks). While we can see that the slope of the powerlaw tail for the Advogato network indicates higher levels of disassortativity at higher degree nodes (Figure 5), the Pearson scores show a weaker overall correlation at long geodesic distances; however, the random network models show nearly no correlation except among firstneighbors (m=1, Figure 7).
This can be contrasted against results from the random walk model of Vàzquez [20], illustrated for various values of q _{ e } in Figure 8. These random networks generally show assortative degree correlations in firstneighbors for all values of q _{ e }, but mostly disassortative degree mixing among nodes at longer geodesic distances. This result is generally consistent with the trends observed for the social networks (see Figure 5, and asterisks in Figure 8). Nevertheless, close matching of the Pearson scores only occurs for q _{ e }=0.9. Such a high value of q _{ e } guarantees many successful sequential attachment rounds in the random walk procedure, and thus increases the overall number of links. Whether the closer matching of Pearson scores at high q _{ e } is the result of the increased number of links, or their approximate placement, remains an open question.
Summary and Conclusions
In this paper we have studied three online social networks, and compared their longrange degree correlation behavior to those of three nonsocial networks by measuring both the average number of neighbors or calculating the Pearson correlation score. We found that the number of friendships/associations in the social networks remained above the background level for at least m=4 “degrees of separation”. In contrast, the nonsocial networks reached the background level after approximately m=3 steps from each node.
We also examined the conditional probability that a node degree is connected to one separated by at least one link, p(k _{ m }k _{0}), by measuring the average number of nearest neighbors, 〈k _{ m }〉(k _{0}). We found that the social networks generally exhibited a powerlaw tail with exponent γ<1, for the longerrange interactions (m≥3). We did not observe this phenomenon in the nonsocial networks, which appeared nearly uncorrelated at this geodesic distance.
Finally, we considered the Advogato network as a prototypical social network, and examined whether two networkgrowing algorithms known to generate degree correlations could reproduce the longrange correlations observed in the social network as measured by the Pearson correlation. While we observed that the “random walk” algorithm [20] and a variant of the celebrated BarabàsiAlbert (preferential attachment) model [22] showed similar uncorrelated results at the farthest separation (m=5), correlations in the Advogato network deviated substantially from the random models for m<5. We conclude that these random nodeattachment mechanisms cannot fully explain how social networks gain new users, but could not entirely reject this possibility. Further investigations are therefore required to definitively answer this question.
References
 1
Newman, MEJ: Assortative mixing in networks. Phys. Rev. Lett. 89, 208701 (2002).
 2
McPherson, M, SmithLovin, L, Cook, JM: Birds of a feather: Homophily in social networks. Annu. Rev. Sociol. 27, 415–444 (2001).
 3
Feld, SL: Why your friends have more friends than you do. Am. J. Sociol. 96, 1464–1477 (1991).
 4
Christakis, NA, Fowler, JH: The spread of obesity in a large social network over 32 years. New Engl. J. Med. 357, 370 (2007).
 5
Christakis, NA, Fowler, JH: The collective dynamics of smoking in a large social network. New Engl. J. Med. 358, 2249 (2008).
 6
Christakis, NA, Fowler, JH: Social contagion theory: examining dynamic social networks and human behavior. Stat. Med. 32, 556 (2011).
 7
Dawah, HA, Hawkins, BA, Claridge, MF: Structure of the parasitoid communities of grassfeeding chalcid wasps. J. Anim. Ecol. 64, 708 (1995).
 8
Redner, S: Teasing out the missing links. Nature. 453, 47 (2008).
 9
PastorSatorras, R, Vàzquez, A, Vespignani, A: Dynamical and correlation properties of the internet. Phys. Rev. Lett. 87, 258701 (2001).
 10
Mayo, M, Abdelzaher, A, Ghosh, P: Mixed degreedegree correlations in directed social networks. In: Zhang, Z, Wu, L, Xu, W, Du, DZ (eds.)Combinatorial Optimization and Applications. Lecture Notes in Computer Science, pp. 571–580. Springer (2014).
 11
Serrano, M, Maguitman, A, Boguñá, M, Fortunato, S, Vespignani, A: Decoding the structure of the www: A comparative analysis of web crawls. ACM Trans. Web. 1, 10 (2007).
 12
Massa, P, Salvetti, M, Tomasoni, D: Bowling alone and trust decline in social network sites. In: Proceedings of the Eighth IEEE International Conference on Dependable, Autonomic and Secure Computing, DASC’09, pp. 658–663. IEEE (2009).
 13
Ripeanu, M, Foster, I, Iamnitchi, A: Mapping the gnutella network: Properties of largescale peertopeer systems and implications for system design. IEEE Internet. Comput. 6, 50–57 (2002).
 14
Leskovec, J, Huttenlocher, D, Kleinberg, J: Signed networks in social media. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1361–1370. ACM (2010).
 15
Leskovec, J, Huttenlocher, D, Kleinberg, J: Predicting positive and negative links in online social networks. In: Proceedings of the 19th International Conference on World Wide Web, pp. 641–650. ACM (2010).
 16
Colizza, V, PastorSatorras, R, Vespignani, A: Reactiondiffusion processes and metapopulation models in heterogeneous networks. Nature Physics. 3, 276–282 (2007).
 17
Schaffter, T, Marbach, D, Floreano, D: Genenetweaver: in silico benchmark generation and performance profiling of network inference methods. Bioinformatics. 27(16), 2263–2270 (2011). doi:10.1093/bioinformatics/btr373.
 18
Ma, HW, Buer, J, Zeng, AP: Hierarchical structure and modules in the escherichia coli transcriptional regulatory network revealed by a new topdown approach. BMC Bioinf. 5, 199 (2004).
 19
Girvan, M, Newman, MEJ: Community structure in social and biological networks. P. Natl. Acad. Sci. USA. 99, 7821–7826 (2002).
 20
Vàzquez, A: Growing network with local rules: Preferential attachment, clustering hierarchy, and degree correlations. Phys. Rev. E. 67, 056104 (2003).
 21
Park, J, Newman, MEJ: Origin of degree correlations in the internet and other networks. Phys. Rev. E. 68, 026112 (2003).
 22
Barabàsi, AL, Albert, R: Emergence of scaling in random networks. Science. 286, 509–512 (1999).
 23
Krapivsky, PL, Redner, S: Organization of growing random networks. Phys. Rev. E. 63, 066123 (2001).
Acknowledgements
Funding was provided by the US Army’s Environmental Quality and Installations 6.1 Basic Research program. Opinions, interpretations, conclusions, and recommendations are those of the author(s) and are not necessarily endorsed by the U.S. Army.
Author information
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
MM and PG conceptualized and designed the study, and interpreted the results. MM and AA executed the research. MM, AA, and PG wrote the paper. All authors read and approved the final manuscript.
Rights and permissions
About this article
Received
Accepted
Published
DOI
Keywords
 Social networks
 Pearson correlation
 Assortativity
 Degree correlation