 Research
 Open access
 Published:
The structure of copublications multilayer network
Computational Social Networks volume 8, Article number: 8 (2021)
Abstract
Using the headers of scientific papers, we have built multilayer networks of entities involved in research namely: authors, laboratories, and institutions. We have analyzed some properties of such networks built from data extracted from the HAL archives and found that the network at each layer is a smallworld network with power law distribution. In order to simulate such copublication network, we propose a multilayer network generation model based on the formation of cliques at each layer and the affiliation of each new node to the higher layers. The clique is built from new and existing nodes selected using preferential attachment. We also show that, the degree distribution of generated layers follows a power law. From the simulations of our model, we show that the generated multilayer networks reproduce the studied properties of copublication networks.
Introduction
The recent years of research in networks science have been characterized by many and more attempts to generalize the traditional network theory by developing and validating a novel framework for the study of multilayer networks, i.e., graphs where constituents are nodes, and several layers of connections have to be taken into account to accurately describe the nodes interactions. Multilayer networks explicitly incorporate multiple channels of connectivity and constitute the natural environment to describe systems interconnected through different categories of connections: each channel (relationship, activity, category, etc.) is represented by a layer and the same node or entity may have different kinds of interactions (different set of neighbors in each layer). For instance, in social networks, one can consider several types of different relationships: friendship, vicinity, kinship, membership of the same cultural society, partnership or coworkership, etc [1].
Such a change of paradigm that was termed in disparate ways (multiplex networks, networks of networks, interdependent networks, hypergraphs, and many others), already led to a series of very relevant and unexpected results [1].
Using the headers of scientific papers, we have represented and analyzed three networks: researchers, laboratories and institutions networks. Header of a paper is the name of the researchers(authors) and theirs affiliations. The term institution is used here to refer to an university, a research center or a research institution. The networks are interdependent because of the affiliation relationship between the entities of the three networks. In fact, a researcher is affiliated to at least one laboratory and a laboratory affiliated to at least an institution. We therefore say that these networks are hierarchical and are induced from collaboration between researchers (see Fig. 1). In order to understand and simulate these systems we have to master both how the actors interact and the rules of affiliations to organizations.
We generalize this hierarchical copublication network by a complex system where an actor is affiliated to an organization that can also be affiliated to a higher level organization and so on. The relationships between entities at the same level are deduced from interaction of those at the lower level. The interaction between the actors (at the lowest level) is the process that induces the relationship between the organization at the other levels. i. e, two researchers are not connected because they are members of the same laboratory but because they copublished a paper. This is the main difference between affiliation network used in this paper and those of Silvio Lattanzi et al. [2] who consider that in social networks there are two types of entities, actors and societies, that are linked by affiliation of the former to the latter.
After some definitions and the state of the art presented in "Multilayer networks generation models" section, the method applied for the work conducted in this paper was firstly to use the headers of 70224 scientific papers of the HAL archives to build copublications networks. We measured various structural properties of these networks such as degreedistributions, average distance, clustering coefficient, ..."Co‑publications multilayer networks" section is dedicated to this first part of our contribution. From this study, we derived a hierarchical network generation model that can reproduce the measured properties. The model is presented in "The hierarchical network generation model" section with some mathematical results regarding the number of nodes, the number of edges and exponent of the power law degreedistribution of the generated networks. Finally in "Simulations" section, we present some simulations results of the proposed generation model and discussions concerning the comparison between the properties of the simulated networks and those of the realworld networks (built from the HAL dataset). "Conclusion" section concludes the paper.
Multilayer networks generation models
The last 15 years have seen the birth of a movement in science: the complex networks theory. It involved the interdisciplinary effort of some of our best scientists in the aim of exploiting the current availability of big data in order to extract the ultimate and optimal representation of the underlying complex systems and mechanisms. The main goals were (i) the extraction of unifying principles that could encompass and describe (under some generic and universal rules) the structural accommodation that is being detected ubiquitously, and (ii) the modeling of the resulting emergent dynamics to explain what is actually seen and experienced from the observation of such systems.
Network theory provides various tools for investigating the structural or functional topology of many complex systems found in nature, technology and society. There are many applications of multilayer graphs in various areas such as biology, transportation and social network [3,4,5,6,7].
Definitions and notations
A network (or graph) is a pair \(G=(X, E)\), where X set of items, which we will call nodes and E is a set of connections between the nodes, called edges. A set of nodes joined by edges is only the simplest type of network; there are many ways in which networks may be more complex than this. For instance, there may be more than one different type of node in a network or more than one different type of edge. Nodes or edges may also have a variety of properties, numerical or categorical associated.
Graphs of directed edges [8] are themselves called directed graphs or sometimes digraphs. One can also have hyperedges—edges that join more than two nodes together. Graphs containing such edges are called hypergraphs [9]. Graphs may also be naturally partitioned in various ways. For example bipartite graphs: graphs that contain nodes of two distinct types, with edges running only between unlike types [10, 11]. Socalled affiliation networks in which people are joined together by common membership of groups take this form, the two types of nodes representing the people and the groups [2].
There is no consensual definition of multilayer graphs. There are several definitional approaches in the literature [1, 12, 13]. In this work, we will use the definition of [1]. A multilayer network is a pair \({\mathcal {M}} = ({\mathcal {G}}, {\mathcal {C}})\), where \({\mathcal {G}} = \{G_\alpha ; \alpha \in \{1, \ldots , M \}\}\) is a family of (directed or undirected, weighted or unweighted) graphs \(G_\alpha = (X_\alpha , E_\alpha )\) (called layers of \({\mathcal {M}}\)) and
is the set of interconnections between nodes of different layers \(G_\alpha\) and \(G_\beta\) with \(\alpha \ne \beta\). The elements of \({\mathcal {C}}\) are called crossed layers, and the elements of each \(E_\alpha\) are called intralayer connections of \({\mathcal {M}}\) in contrast with the elements of each \(E_{\alpha \beta }\) \((\alpha \ne \beta )\)that are called interlayer connections.
This mathematical model is suited to describe phenomena in social systems, as well as many other complex systems. By using this representation, we simultaneously take into account: the links inside the different groups, the nature of the links and the relationships between elements that (possibly) belong to different layers and the specific nodes belonging to each layer involved.
It is important to notice that the concept of multilayer network extends that of other mathematical objects, such as: Multiplex network [14], Temporal networks [15], Interacting or interconnected networks [16], Multidimensional networks [17], Interdependent (or layered) networks [18, 19], Multilevel networks [20], Hypernetworks (or hypergraphs) [9].
The degree of a node [21] \(i \in X\) of a multiplex network \({\mathcal {M}} = ({\mathcal {G}}, {\mathcal {C}})\) is the vector \(k_i=[k_i^1, \ldots , k_i^M]\), where \(k_i^\alpha\) is the degree of the node i in the layer \(\alpha\). This vectortype node degree is the natural extension of the established definition of the node degree in a monolayer network.
We say that node i, with \(i=1, 2, \ldots , N\) is active at layer \(\alpha\) if \(k^\alpha _i > 0\). We can then associate to each node i a nodeactivity vector \(b_i=\{b_i^{[1]}, b_i^{[2]}, \ldots , b_i^{[M]} \}\), where \(b_i^{[\alpha ]}=1\) if \(k^\alpha _i > 0\), while \(b_i^{[\alpha ]}=0\) otherwise. We call nodeactivity \(B_i\) of node i the number of layers on which node i is active.
Properties of realworld complex networks
It has been recently shown that most realworld complex networks have some essential properties in common. Three properties received much attention due to the fact that they have unexpected behaviors in realworld complex networks: the average distance between nodes, the clustering and the degree distribution.
Most of realworld complex networks have the smallworld property, i.e., short average distance [22, 23]. The smallworld concept originated from the famous experience made by Milgram [24]. Another property of many realworld networks is the presence of high average clustering coefficient.
The degree distribution which is, for each k, the probability \(p_k\) that a randomly chosen node has degree k, is completely different from what was expected. Indeed, for almost all realworld complex networks, the degree distribution follows a power law: \(p_k \approx k ^{\alpha }\). The exponent \(\alpha\) of the power law is generally between 2 and 3. Such a distribution means that although most nodes have a small degree, the number of nodes with degree k decays only polynomially with k, and therefore there is a significant number of nodes with high degree. It has been shown in the literature that many coauthorship networks follow power law degree distribution [25,26,27].
The state of the art of generation models
There are basically two ways to propose a model for network generations:

The first may consider a set of observed properties as essential, and then sample randomly objects among the ones which have these properties. Proceeding this way, will yields a typical object with the concerned properties [28,29,30]. It is then possible to determine if the retained set of properties is sufficient (do the random objects produced by the model fit well the real one? ) and to study the expected behavior of the object of interest. The relevance of the set of properties is generally checked using other known properties or behaviors of the object.

The second define’s a construction network generation models process inspired from the way the object is really constructed [2, 31,32,33]. This construction process is generally iterated from an initial state, and eventually leads to an appropriate object. The analysis then concerns the properties induced by the construction process: do they fit realworld properties?
For more details, the reader can refer to [34] for a large overview on the model of simple network. Similarly to monolayer networks, most of the models for generation of multilayer networks can be divided also into two classes:

Growing multilayer networks models, in which the number of the nodes grows, and there is a generalized preferential attachment rule [35,36,37]. These models explain multilayer network evolution starting from simple, and fundamental rules for their dynamics.

Multilayer network ensembles, which are ensembles of networks with N nodes in each layer satisfying a certain set of structural constraints [38,39,40]. These ensembles are able to generate multilayer networks with fully controlled set of degree–degree correlations and of overlap.
In [33] Meleu and Melatagia proposed a networks generation model based on the formation of cliques to reproduce collaboration networks. At each step of the model, a clique of \(\lambda \eta\) existing nodes and \((1\lambda )\eta\) new nodes is created and added in the network; P is the distribution of the number of nodes per collaboration, \(\eta\) is the mathematical expectation of P and \(\lambda\) is the proportion of old nodes per clique. The old nodes are selected according to preferential attachment. The main difference between the model of Zhang et al. [41] and those of Meleu et al. is that, Zhang et al. consider only one new node while Meleu et al. define a parameter \(\lambda\) that controls the proportion of new nodes. The model of Meleu et al. is thus the generalization of the model of Zhang et al. (see [33] for details).
In [35, 36] a growing multiplex model has been proposed: the network has a dynamics dictated by growth, and generalized preferential attachment. Starting at time \(t = 0\) from a duplex network with \(n_0\) nodes (with a replica in each of the two layers) connected by \(m_0 > m\) links in each layer, the model proceeds as follows:

Growth: At each time \(t \ge 1\) a node with a replica node in each of the two layers is added to the multiplex. Each newly added replica node is connected to the other nodes of the same layer by m links.

Generalized preferential attachment: The new link in layers \(\alpha = 1, 2\) is attached to node i with probability \(\Pi _i^\alpha\) proportional to a linear combination of the degree \(k_i^{[1]}\) of node i in layer 1 and \(k_i^{[2]}\) of node i in layer 2.
Growing multiplex network models have been proposed in [20, 42], where the multiplex network grows by the addition of an entire new layer at each step. In [20], two nodes i and j in the new layer are linked with a probability \(p_{ij}\) that depends on the quantity there called node multiplexity \(Q_{ij}\). In particular, \(p_{ij}\) can be either positively correlated with \(Q_{ij}\) , or negatively correlated to it. In the first case two nodes that are active at the same time in many layers are more likely to be connected in the new layer, in the second case two nodes that are active at the same time in many layers have small probability to be connected in the new layer. In [42], instead, every node i of the new layer will be active with a probability \(P_i\) proportional to the activity of the node \(B_i\): \(P_i \approx A + B_i(t)\), where A is a parameter of the model. This model enforces a sort of “preferential attachment” of the new layers to nodes of high activity \(B_i\) , and a power law distribution \(P(B_i)\) of the activities of the nodes.
The simplest way to obtain a static generative model for multiplex networks is to generalize the existing methods for singlelayer ones [1]. Fixing the degree sequence in each layer, one can use a configuration model to obtain a particular realization of the given set of connectivities. In [43] the authors have made the choice to add interlayer links arbitrarily. A different approach is to keep using a configuration model, but to specify the edges between the layers by means of a jointdegree distribution [44,45,46].
A similar method is to specify the degree sequences together with a probability matrix whose element (i, j) is the fraction of interlayer links between layers i and j. The actual link placement is still achieved via uniform random choice [47]. A generalization of this approach has been proposed in [48]; the authors impose the degree correlations within and between layers by means of a set of matrices that specify the fraction of edges between nodes of given degrees in given layers.
Copublications multilayer networks
By browsing the headers of scientific papers, we can represent and analyze three networks namely: authors’ networks, laboratories’ networks and institutions’ networks. We referred by header of paper, the description of the title of the paper, the names of authors and their affiliations (Fig. 2 is an example of paper’s header). The networks are defined as follows:

1.
The network of authors. A node represents an author and, if an author i coauthored a paper with author j, the graph contains an undirected edge from i to j. If the paper is coauthored by k authors this generates a completely connected (sub)graph on k nodes.

2.
The network of laboratories. A node represents a laboratory in which at least one author published a paper, an edge links two laboratories if it exists at least one paper coauthored by authors of these two laboratories.

3.
The network of institutions. A node represents an institution of the authors who have published at least one paper and, an edge links two institutions if it exists at least one paper coauthored by authors from two laboratories each related to these institutions. The term institution is used to refer to a university, research center or a research institution.
Because of a reciprocal collaboration, these networks are undirected. They can be weighted by the number of publications between the two entities or not according to the goal that is given to the study. In this study, they are unweighted. Note that, for the type of nodes (author, laboratory or institution) considered, network construction is summarized in the subsequent creation of clique. The three generated networks are interdependent because of the affiliation relationship between their entities. So, in addition to the above description, we add new edges that represent affiliations’ relations between authors and laboratories and affiliations’ relations between laboratories and institutions.
We then say that these networks are multilayered (see Fig. 1) and are deduced from collaboration between authors. Indeed, the actors involved in copublication are the authors. In theses networks, we have two types of relationship: collaboration(at the same level) and affiliation(between two levels).
The studied networks can be formally represented by \({\mathcal {M}} = ({\mathcal {G}}, {\mathcal {C}})\), where: \({\mathcal {G}} = \{G_\alpha ; \alpha \in \{1, 2 , 3 \}\}\) is a layers of \({\mathcal {M}}\) with \(G_1\) is network of researchers, \(G_2\) is network of laboratories, \(G_3\) is network of institutions. Each \(G_\alpha ; \alpha \in \{1, 2 , 3 \}\) is a collaboration network and \({\mathcal {C}}\) is a set the affiliations edges.
We have built multilayer networks from the publications of the open archive HAL^{Footnote 1} between 2006 and 2016. HAL is an open archive where authors can deposit scholarly documents from all academic fields. The total number of papers used in this dataset is 70224 organized in eight research fields.
We have analyzed the average number of entities affiliated to an organization. Precisely, we looked the average number of researchers affiliated to a laboratory of a given degree and the number of laboratories affiliated with an institution of a given degree (see Fig. 3). Using linear regression model, we approximated the relation between the average number of entities affiliated to an organization of a given degree and the degree of organization. So the number of node at layer \(\alpha\), affiliated to an organization at layer \(\alpha +1\) with collaboration degree k can be defined as follows:
We have compared the degree distribution of the three layer (Fig. 4). We found that all the three layers have a degree distribution that follows power law. Looking the densities \(\delta _\alpha\) and the average distance \(l_\alpha ; \alpha \in \{1, 2, 3\}\) (Table 1), we found that, for all the fields of the HAL dataset:
As shown in Table 1, each of the three networks layers has a high clustering coefficient (\(C\approx 0.83\)), a low average distance (\(l\approx 7\)); they are smallworld networks.
From all the observations made by analyzing the hierarchical network of the HAL dataset, we designed a network generation model that will reproduce the main properties of such type of networks.
The hierarchical network generation model
Collaboration and affiliation algorithms
Consider that each actor (elements involve in a collaborations) of our model is affiliated to at least one organization. An organization is also affiliated to at least one higher level of organization. For simplification purpose, we suppose that, each actor or organization is affiliated to only one organization and the mobility of the actors (as for the authors in laboratories) is not considered. In this context, a node \(x^\alpha = (id, aff)\) at layer \(\alpha\) is represented by its ID id and the ID aff of his affiliation \(y^{\alpha +1}\) at a layer \(\alpha +1\). The actors belong to layer 0. The affiliations of actors are in layer 1. The affiliations of organizations at layer 1 is in layer 2 and so on.
To generate edges between node in the same layers, we propose a growth model for the multilayer collaboration network similar to that of Meleu et al. [33] (Algorithm 1). It is an iterative model that simulates at each step a collaboration between actor (node of layer 0) and creates relationships in networks. The collaboration at any layer \(\alpha \ge 1\) is deduced by the affiliations of the nodes at layer \(\alpha 1\).
In Algorithm 1, the selection of old nodes is made according to preferential attachment; an old actor i of degree \(k_i\) (inlayer degree) is selected with probability proportional to \(P_{i}= k_i / \sum _{j}k_j\). To create edges in other layers, we proceed recursively: from level 0, we select in layer 1 the affiliation of the nodes in the collaboration then, we create a clique with these nodes at level 1. We create edges in level 2 using the previous affiliation nodes and select their affiliation nodes at layer 2. . . This process is given in Algorithm 2.
Let us define the affiliation vector by the set \({\mathcal {V}}=\{ \lambda _0, \lambda _1, \ldots , \lambda _{M1}\}\), where \(\lambda _\alpha , \alpha >0\) is a probability of a new node at level \(\alpha 1\) to be affiliated to an old node at level \(\alpha\) and \(\lambda _0\) is the proportion of old nodes by collaboration at level 0. We can observe that, if \(M=1\), this model (represented by Algorithm 2) is the same as to Meleu et al. ’s model [33]. So the network at layer 0 has the properties described in model [33].
When we create node at level 0, we decide, using affiliation vector to affiliate this node to old or new node at level 1. In the case of affiliation to a new node, we create node at level 1 and then decide (using the affiliation vector) to affiliate this node to an old or a new node at level 2. This is done recursively for the upper layers. We propose in Algorithm 3, a process to create nodes and affiliate them to their organizations. The node affiliation to organizations follows a preferential attachment.
A generated multilayer network is a pair \({\mathcal {M}} = ({\mathcal {G}}, {\mathcal {C}})\), where

\({\mathcal {G}} = \{G_\alpha ; \alpha \in \{0, 1, \ldots , M1\}\}\) is a family of collaborations graphs \(G_\alpha = (X_\alpha , E_\alpha )\). This can be a collaboration networks of actors or collaboration network at each organization’s level.

\({\mathcal {C}} = \{E_{\alpha \beta } \subseteq X_\alpha \times X_\beta ; \alpha , \beta \in \{0, 1, \ldots , M1 \}, \alpha \ne \beta \}\) is the set of affiliations between actors and organizations or between organizations and suborganization \(G_\alpha.\)
Properties of the generated networks
Let \({\mathcal {M}} = ({\mathcal {G}}, {\mathcal {C}})\) be a multilayer network generated by our model.
Proposition 1
The number of nodes in \(G_\alpha =(X_\alpha , E_\alpha )\) is:
where t is the number of collaborations generated.
Proof 1
We proceed by induction.

1
At layer 0, the network is the same as networks generated by Meleu et al. ’s model [33], so:
$$\begin{aligned} X_\alpha =t(1\lambda _0)\eta . \end{aligned}$$ 
2
Consider that, at layer \(\alpha\), \(0\le \alpha \le M\), we have
$$\begin{aligned} X_\alpha =t(1\lambda _0)(1\lambda _1)\dots (1\lambda _\alpha )\eta . \end{aligned}$$We will show that we have \(X_{\alpha +1}=t(1\lambda )(1\lambda _1)\dots (1\lambda _\alpha )\eta\) at layer \(\alpha +1\).
In fact, at each step

\((1\lambda )\eta\) new nodes are added at layer 0.

Those nodes generate \((1\lambda _0)(1\lambda _1)\eta\) new nodes at layer 1 by affiliating new nodes of layer 0 with the new nodes in this layer 1. Since, each new node in layer 0 has the probability \(1\lambda _1\) (using Algorithm 3 and affiliation vector) to be affiliated to a new node in layer 1.

By recurrence, at layer \(\alpha\) we have \((1\lambda _0)(1\lambda _1)\dots (1\lambda _\alpha )\eta\) new nodes are added as affiliation of \((1\lambda )(1\lambda _1)\dots (1\lambda _{\alpha 1})\eta\) new nodes of layer \(\alpha 1\). Those \((1\lambda _0)(1\lambda _1)\dots (1\lambda _\alpha )\eta\) new nodes of layer \(\alpha\) will generate \((1\lambda _0)(1\lambda _1)\dots (1\lambda _\alpha )(1\lambda _{\alpha +1})\eta\) new nodes in layer \(\alpha +1\).
Hence, for t steps, we find:
$$\begin{aligned} X_{\alpha +1}=t(1\lambda _0)(1\lambda _1)\dots (1\lambda _{\alpha +1})\eta . \end{aligned}$$ 
\(\square\)
Proposition 2
The number of edges \(E_\alpha \) in the network is:
where t is the number of collaborations generated.
Proof 2
While selecting or/and creating \(\eta\) actors by collaboration in layers 0, it is possible that all of them will be affiliated to different organizations in all the other layers \(\alpha \ge 1\). Thus, the maximum number of edges created by a collaboration in each of the layer \(\alpha \ge 1\) is then:
On other hand, at each step let us consider that all the old nodes of layer 0 involved in the clique creation are affiliated to the same organization i. e \((1\lambda _0)\eta\) actors are affiliated to the same node x at layer 1. At this level(1), the number of new edges will be equal to the number of edges that link the organization x to all the new organizations added by the creation of a new node at level 0 (i. e affiliation of new actors). This number is:
We have shown that, at layer \(\alpha\), \(0\le \alpha \le M1\), we have \(\prod _{i=0}^{\alpha }(1\lambda _i)\) new nodes. These new nodes generate \(\prod _{i=0}^{\alpha +1}(1\lambda _i)\) new nodes at layer \(\alpha +1\) and we have assumed that, old nodes are affiliated to the same node at layer \(\alpha +1\), thus, the edges added at layer \(\alpha +1\) is the edges of the clique of \(1+\eta \prod _{i=0}^{\alpha }(1\lambda _i)\) nodes which is:
The result is deduced by considering t steps. \(\square\)
Proposition 3
The average degree \({\bar{d}}_\alpha\) of \(G_{\alpha }=(X_{\alpha }, E_{\alpha })\) is:
Proof 3
By definition:
The proposition is deduced from Eq(3) and Eq. (4). \(\square\)
Theorem 1
If the average degree at layer \(\alpha\) is \({\bar{d}}_\alpha =\left( \eta 1 \right) /\left( \prod \limits _{j=0}^{\alpha }(1\lambda _j)\right)\) , the degree distribution in layer \(\alpha\) of a generated multilayer network follows a power law of parameter \(\gamma _\alpha\) as:
Proof 4
For \(\alpha =0\), the result is shown in Ref. [33]. For \(\alpha > 0\), according the hypothesis on the average degree: the probability that a new node at layer \(\alpha\) will connect to a node of the layer \(\alpha +1\) is proportional to the degree of this last node in this \(\alpha +1\) layer.
Since we assume Eq 1, the probability that a node \(x^{\alpha 1}\) is affiliated to an organization \(y^\alpha\) of degree \(k_y^\alpha\) is proportional to the degree of this node, i.e.:
The variation of the nodes of degree k in layer \(\alpha\) is impacted by:

the selection of nodes in layer \(\alpha 1\) affiliated to node of layer \(\alpha\) having degree k.

the selection of old nodes of degree k to affiliate new nodes of layer \(\alpha 1\) using affiliation vector.
It follows that, the number of nodes of degree k at step t in layer \(\alpha\) that gain an edge when the algorithm creates a new collaboration is:
Using Eq. 7 we obtain
If we denote by \(p_{k, t}\) the value of \(p_k\) when the network has \(n_t\) vertices, then the variation in \(n_tpk\) per \(\prod _{i=0}^{\alpha }(1\lambda _i)\eta\) vertices added is:
Looking for stationary solutions \(p_{k, t+1} = p_{k, t} = p_ k\) as, the variation of the number of nodes of degree k at layer \(\alpha\) is then:
with:
By simplifying in Eq. (9) we obtain:
where \(c=P_{\eta 1}\) et \(B(a, b) = \frac{\Gamma (a)\Gamma (b)}{\Gamma (a + b)}\) is Legendre’s betafunction, which goes asymptotically as \(a^{b}\) for large a and fixed b, and hence
\(\square\)
Theorem 2
If the average degree is \(1+\eta \prod _{i=0}^{\alpha }(1\lambda _i)\) , the degree distribution in layer \(\alpha >0\) of a generated multilayer network follows a power law of parameter \(\gamma _\alpha\) as:
Proof 5
Let \(\theta =\prod \limits _{j=0}^{\alpha }(1\lambda _j)\); similarly to the previous case, and looking for stationary solutions \(p_{k, t+1} = p_{k, t} = p_ k\) as, the variation of the number of nodes of degree k at layer \(\alpha\) is:
with:
Thus:
By simplifying in Eq. (13), we obtain
where \(c=P_{\eta \prod \limits _{j=0}^{\alpha }(1\lambda _j)}\) et \(B(a, b) = \frac{\Gamma (a)\Gamma (b)}{\Gamma (a + b)}\) is Legendre’s betafunction, which goes asymptotically as \(a^{b}\) for large a and fixed b, and hence:
\(\square\)
Simulations
We have implemented our model using a customwritten Java program. The outputs of this program were used by a custom program written in python and based
on the libraries: NetworkX^{Footnote 2}, Pymnet^{Footnote 3} and powerlaw [49]; to make most of the measurement on the generated networks. To generate the networks, we extracted parameters such as number of collaborations to generate, the affiliation vector and the distribution of numbers of actors by collaboration from the different fields of HAL dataset.
The first behavior that we wanted to observe on the simulated networks is the correlation between the inlayer degree of an organization and the number of members affiliated. Using linear regressions, we compared this correlation between real networks (Fig. 3) and the generated one (Fig. 5) and, it appears that, their values are close.
Figure 6 shows the comparison of degree–degree correlation between consecutive layers. Layer 0 is the collaboration network of researchers, layer 1 is the collaboration network of laboratories and the layers 2 the collaboration network of institutions. We can see that on real networks and simulations the curves can correctly be approximated using linear regression. The positive slopes of the obtained regression line mean that researchers, with high number of collaborations are affiliated to organizations with strong capacity of cooperation. This is easily explained on the simulations by the fact that, since a node is affiliated to a single organization, each time this node participates in collaborations, it induces participation in the collaborations of its affiliations organizations.
In Fig. 7, we have depicted the degree distributions of the different layers. We found that the simulations reproduce the power law distributions observed on the three layers of HAL dataset. Indeed, the exponent (\(\gamma\)) of the power law degreedistribution in Tables 1, 2 and 3 are close. From Table 3 we observe that, the simulated layers networks have very high clustering coefficients(\(C\approx 0.80\)) and high transitivities (\(T\approx 0.2\)). This behavior contributes to create high density and high average degree in the affiliation layers in comparison with the current layer. So the density grows inversely than average distance. We can conclude that, the simulated networks layers are the smallworld network.
Conclusion
In this article, we have shown that headers of scientific papers can be used to build copublication networks that are multilayered networks of entities involved in research namely: authors, laboratories, and institutions. Indeed, in addition to the collaborative relationships that exist between entities of each type, there are affiliation relationships of authors to laboratories and laboratories to institutions. We have analyzed the properties of such networks built from data extracted from the HAL platform which is a free archive of scientific publications.
Following the observations made on the properties of copublications multilayer networks, we generalized these networks to a system of actors and organizations such that an actor is affiliated to an organization and each organization is affiliated with a higher level organization. We then said that the graphs are hierarchical and are deduced from the collaboration between the actors. The actors collaborate together and the relationships in the different layers are deduced from these collaboration and their affiliation relationships. We proposed an algorithmic model to build graphs presenting such properties. It is an iterative model that builds a collaboration clique and related affiliations at each step. We showed that the degree distribution in different layers follows a power law and the simulations carried out showed that the studied properties of the generated layers are close to those of the the realworld network built from HAL dataset.
In the future, we are planning to explore the structure and dynamics of communities in such copublication networks. Indeed, the high clustering coefficient and high transitivity in these graphs suggest the existence of many communities. Before doing this, we will verify the robustness of our model by using other scientific publications archives and perform a more accurate evaluation of the gaps between the values of the properties of the generated networks and those of realworld networks.
Data availability statement
The datasets analyzed during the current study were extracted from The open archive HAL: https://hal.archivesouvertes.fr/.
References
Boccaletti S, Bianconi G, Criado R, Del Genio CI, GómezGardeñes J, Romance M, SendiñaNadal I, Wang Z, Zanin M. The structure and dynamics of multilayer networks. Phys Rep. 2014;544(1):1–122.
Lattanzi S, Sivakumar D. Affiliation networks. In: Proceedings of the Fortyfirst Annual ACM Symposium on Theory of Computing, ACM; 2009. p. 427–434
Zitnik M, Leskovec J. Predicting multicellular function through multilayer tissue networks. Bioinformatics. 2017;33(14):190–8.
Gallotti R, Barthelemy M. Anatomy and efficiency of urban multimodal mobility. Sci Rep. 2014;4:6911.
Cardillo A, Zanin M, GómezGardenes J, Romance M, del Amo AJG, Boccaletti S. Modeling the multilayer nature of the european air transport network: Resilience and passengers rescheduling under random failures. Eur Phys J Spec Top. 2013;215(1):23–33.
Battiston F, Nicosia V, Latora V. Structural measures for multiplex networks. Phys Rev E. 2014;89(3):032804.
Hristova D, Noulas A, Brown C, Musolesi M, Mascolo C. A multilayer approach to multiplexity and link prediction in online geosocial networks. EPJ Data Sci. 2016;5(1):24.
Kurucz M, Benczur A, Csalogány K, Lukács L. Spectral clustering in telephone call graphs. In: Proceedings of the 9th WebKDD and 1st SNAKDD 2007 Workshop on Web Mining and Social Network Analysis, ACM; 2007. p. 82–91
Berge C. Hypergraphs: Combinatorics of Finite Sets, vol. 45. New York: Elsevier; 1984.
Kamga V, Tchuente M, Viennet E. Prévision de liens dans les graphes bipartites avec attributs. In: AAFD; 2012. p. 57–70.
Ngonmang B, Viennet E, Tchuente M, Kamga V. Community analysis and link prediction in dynamic social networks. In: Computing in Research and Development in Africa, Springer; 2015. p. 83–101
De Domenico M, SoléRibalta A, Cozzo E, Kivelä M, Moreno Y, Porter MA, Gómez S, Arenas A. Mathematical formulation of multilayer networks. Phys Rev X. 2013;3(4):041022.
Kivelä M, Arenas A, Barthelemy M, Gleeson JP, Moreno Y, Porter MA. Multilayer networks. J Compl Netw. 2014;2(3):203–71.
Solá L, Romance M, Criado R, Flores J, García del Amo A, Boccaletti S. Eigenvector centrality of nodes in multiplex networks. Chaos. 2013;23(3):033131.
Holme P, Saramäki J. Temporal networks. Phys Rep. 2012;519(3):97–125.
Donges J, Schultz H, Marwan N, Zou Y, Kurths J. Investigating the topology of interacting networks. Eur Phys J B. 2011;4(84):635–51.
Berlingerio M, Coscia M, Giannotti F, Monreale A, Pedreschi D. Multidimensional networks: foundations of structural analysis. World Wide Web. 2013;16(5–6):567–93.
Coscia M, Rossetti G, Pennacchioli D, Ceccarelli D, Giannotti F. “you know because i know”: A multidimensional network approach to human resources problem. In: Advances in Social Networks Analysis and Mining (ASONAM), 2013 IEEE/ACM International Conference On, 2013; IEEE. p. 434–441
Gao J, Buldyrev SV, Stanley HE, Havlin S. Networks formed from interdependent networks. Nat Phys. 2012;8(1):40–8.
Criado R, Flores J, García del Amo A, GómezGardeñes J, Romance M. A mathematical model for networks with structures in the mesoscale. Int J Comput Math. 2012;89(3):291–309.
Berlingerio M, Coscia M, Giannotti F, Monreale A, Pedreschi D. Foundations of multidimensional network analysis. In: Advances in Social Networks Analysis and Mining (ASONAM), 2011 International Conference On, IEEE; 2011. p. 485–489
Watts DJ. The dynamics of networks between order and randomness. Small Worlds 1999
Watts DJ, Strogatz SH. Collective dynamics of ‘smallworld’networks. Nature. 1998;393(6684):440–2.
Milgram S. The small world problem. Psychol Today. 1967;2(1):60–7.
De Stefano D, Giordano G, Vitale MP. Issues in the analysis of coauthorship networks. Qual Quant. 2011;45(5):1091–107.
De Stefano D, Fuccella V, Vitale MP, Zaccarin S. The use of different data sources in the analysis of coauthorship networks and scientific performance. Soc Netw. 2013;35(3):370–81.
Mali F, Kronegger L, Doreian P, Ferligoj A. Dynamic scientific coauthorship networks. In: Models of Science Dynamics, Springer; 2012. p. 195–232
Erdos P, Rényi A. On the evolution of random graphs. Publ Math Inst Hung Acad Sci. 1960;5(1):17–60.
Molloy M, Reed B. The size of the giant component of a random graph with a given degree sequence. Combin Prob Comput. 1998;7(03):295–305.
Guillaume JL, Latapy M. Bipartite graphs as models of complex networks. Physica A. 2006;371(2):795–813.
Price DJS. A general theory of bibliometric and other cumulative advantage processes. J Am Soc Inform Sci. 1976;293:1.
Barabási AL, Jeong H, Néda Z, Ravasz E, Schubert A, Vicsek T. Evolution of the social network of scientific collaborations. Physica A. 2002;311(3):590–614.
Meleu GR, Melatagia Yonta P. Growth model for collaboration networks. Revue Africaine de la Recherche en Informatique et Mathématiques Appliqués. 2017;24:1–21.
Newman ME. The structure and function of complex networks. SIAM Rev. 2003;45(2):167–256.
Nicosia V, Bianconi G, Latora V, Barthelemy M. Growing multiplex networks. Phys Rev Lett. 2013;111(5):058701.
Kim JY, Goh KI. Coevolution and correlated multiplexity in multiplex networks. Phys Rev Lett. 2013;111(5):058702.
Magnani M, Rossi L. The mlmodel for multilayer social networks. In: Advances in Social Networks Analysis and Mining (ASONAM), 2011 International Conference On, 2011; IEEE. p. 5–12
Bianconi G. Statistical mechanics of multiplex networks: Entropy and overlap. Phys Rev E. 2013;87(6):062806.
Pattison P, Wasserman S. Logit models and logistic regressions for social networks. Br J Math Stat Psychol. 1999;52(2):169–93.
Wang P, Robins G, Pattison P, Lazega E. Exponential random graph models for multilevel networks. Soc Netw. 2013;35(1):96–115.
Zhang PP, Chen K, He Y, Zhou T, Su BB, Jin Y, Chang H, Zhou YP, Sun LC, Wang BH, et al. Model and empirical study on some collaboration networks. Physica A. 2006;360(2):599–616.
Nicosia V, Latora V. Measuring and modeling correlations in multiplex networks. Phys Rev E. 2015;92(3):032805.
Min B, Do Yi S, Lee KM, Goh KI. Network robustness of multiplex networks with interlayer degree correlations. Phys Rev E. 2014;89(4):042811.
Lee KM, Kim JY, Cho WK, Goh KI, Kim I. Correlated multiplexity and connectivity of multiplex random networks. New J Phys. 2012;14(3):033027.
Funk S, Jansen VA. Interacting epidemics on overlay networks. Phys Rev E. 2010;81(3):036118.
Marceau V, Noël PA, HébertDufresne L, Allard A, Dubé LJ. Modeling the dynamical interaction between epidemics on overlay networks. Phys Rev E. 2011;84(2):026105.
Söderberg B. Properties of random graphs with hidden color. Phys Rev E. 2003;68(2):026107.
Melnik S, Porter MA, Mucha PJ, Gleeson JP. Dynamics on modular networks with heterogeneous correlations. Chaos. 2014;24(2):023106.
Alstott J, Bullmore E, Plenz D. Powerlaw: a python package for analysis of heavytailed distributions. PLoS ONE. 2014;9(1):85777.
Acknowledgements
We would like to sincerely thank the IDASCO members who provided constructive comments.
Open Access
This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Author information
Authors and Affiliations
Contributions
For this article, GRM made: conception or design of the models, data collection, data analysis and interpretation, drafting the article, critical revision of the article, final approval of the version to be published. PYM made: conception or design of the models, critical revision of the article, final approval of the version to be published. Both authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Meleu, G.R., Melatagia, P.Y. The structure of copublications multilayer network. Comput Soc Netw 8, 8 (2021). https://doi.org/10.1186/s4064902100089w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s4064902100089w