In this section, we then provide a formal definition of a PA model, then describe two existing PA models as examples. This is followed by an analysis of the complexity of generating networks from PA models.
Definitions
In this section, we provide a framework for representing general preferential attachment models. Note that the idea of a general PA model is not new to this work and that the formulation presented here is only used to facilitate algorithmic analysis. For a detailed treatment of general preferential attachment, please see ‘The Organization of Random Growing Networks’ by Krapivsky and Redner [13].
Let \(G_{t} = \left (V_{t},E_{t}\right)\) be the network that results from t iterations of a PA simulation. V
t
is the set vertices (or nodes) within the network, and E
t
is the set of edges between elements of V
t
. Let T(G
t
) be the worst case time complexity of generating G
t
; that is, the worst case time complexity of a preferential attachment simulation of t iterations.
Recall that the number of iterations required to generate a network with |V| nodes via PA is Θ(|V|). Accordingly, we will omit t and frame our discussion of complexity T(G) in terms of |V|.
Let A={a
1,a
2,…,a
|A|} be a set of attributes that can be defined on a network node. Let \(X_{v} = \{x_{{va}_{1}}, x_{{va}_{2}}, \ldots, x_{{va}_{|A|}}\} \in \mathbb {R}^{|A|}\) be a setting of A for node v∈V, and let \(\lambda _{{va}_{i}} \in \mathbb {R}\) be the fitness of node v for attribute a
i
. Let:
$$f = \left\{f_{a_{i}}(x_{{va}_{i}},\lambda_{{va}_{i}}): \mathbb{R} \times \mathbb{R} \rightarrow \mathbb{R}^{+} \:|\: a_{i} \in A\right\} $$
be a set of functions, where \(f_{a_{i}} \in f\) maps \(x_{{va}_{i}} \in \mathbb {R}\) and \(\lambda _{{va}_{i}} \in \mathbb {R}\) to a preference mass \(\mu _{{va}_{i}} \in \mathbb {R}^{+}\). The ‘preference mass’ \(\phantom {\dot {i}\!}\mu _{{va}_{i}}\) is a nonnegative real value that is proportional to the probability of selecting v by a
i
under the PA model. We will refer to the elements of f as the ‘preference functions’ of the PA model. Note that, in this work, we restrict our attention to the set of degree-related attributes D (i.e., in-degree, out-degree, and total degree) with settings \(x_{\textit {vd}} \in \mathbb {N} \:\forall \: d \in D\). This implies that the elements of f are defined over the natural numbers:
$$f = \left\{f_{d}(x_{vd},\lambda_{vd}): \mathbb{N} \times \mathbb{R} \rightarrow \mathbb{R}^{+} \:|\: d \in D\right\} $$
The restriction is purely elective; any attribute with real-valued settings could be specified.
A PA model has one or more preference functions. Price’s model, for example, has a single linear preference function. Krapivsky’s model has two: one for in-degree and another for out-degree. A ‘linear preferential attachment model’ only admits linear preference functions of the form g(x,λ)=c
1
x+λ, a ‘quadratic preferential attachment model’ only admits quadratic preference functions of the form g(x,λ)=c
2
x
2+c
1
x+λ and so on.
Description of considered models
Price’s model
Figure 1 describes Price’s algorithm. Briefly, at each time-step, a node is sampled from the network with probability proportional to its in-degree, a new node is introduced to the network, and a directed edge is added from the new node to the sampled node. Notice that a node is added at each time-step, so that the generation of a network with |V| nodes takes |V| steps.
Krapivsky’s model
Figure 1 also describes the algorithm of Krapivsky et al. At each step, the algorithm of Price’s model is followed with probability p, and a ‘preferential edge step’ is taken with probability 1−p. During a preferential edge step, two nodes, n
o
and n
i
, are sampled from the network by out- and in-degree, respectively, and an edge is added from n
o
to n
i
. Note that a node is no longer added at every step; rather, a node is added at a given step with probability p. This implies that the number of iterations required to generate a network with |V| nodes is a random variable with expected value |V|/p. |V|/p is Θ(|V|)∀p, so asymptotically, this is no different than Price’s model. More generally, the number of iterations required to generate a network with |V| nodes via a PA model is Θ(|V|).
Generation complexity
We obtain a trivial lower bound on T(G) by noting that, in order to generate G, we must at the very least output |V| nodes, so T(G)=Ω(|V|).
A discussion of the upper bound follows. Recall that the salient problem in generating networks from a PA model is indexing the network’s nodes in such a way that sampling, insertion, and incrementation can be accomplished efficiently. Tonelli et al. [14] provide a clever method for accomplishing all three tasks in constant time, provided that the preference function is linear and the fitness is both uniform across all nodes and constant. Given constant insertion and sampling times, the generation of a network with |V| nodes takes O(|V|) time. Considering that the lower bound is Ω(|V|), we have the asymptotically tight bound of T(G)=Θ(|V|).
However, this method does not extend to nonlinear preferential attachment (see ‘Related work’ section for details). We can improve performance by shifting to data structures which provide O(log|V|) insertion, sampling, and incrementation, giving an overall complexity of T(G)=O(|V|log|V|).
We accomplish this with a set of augmented tree structures. Each tree supports a preference function of the model by indexing the preference mass assigned to each node in the network by that preference function. Each item in the tree indexes a node in the network. The tree items are annotated with the preference mass of the network node under the preference function and the subtree mass, which is the total preference mass of the subtree that has the item as root; see Figure 2. Note that we refer to ‘items’ in the tree rather than the more typical ‘nodes’; this is to avoid confusion between elements of the tree and elements of the network. We can sample from such a structure by recursively comparing the properly normalized subtree mass of a given item and its children to a uniform random draw; see Figures 2 and 3.
Note that, at each iteration of a standard PA simulation, we must sample a node, update that node’s mass, and insert a new node. In what follows, we show that each of these steps can be accomplished in asymptotically logarithmic time.