- Research
- Open Access

# Revisiting random walk based sampling in networks: evasion of burn-in period and frequent regenerations

- Konstantin Avrachenkov
^{1}View ORCID ID profile, - Vivek S. Borkar
^{2}, - Arun Kadavankandy
^{3}and - Jithin K. Sreedharan
^{4}Email authorView ORCID ID profile

**Received:**2 March 2017**Accepted:**10 February 2018**Published:**19 March 2018

## Abstract

### Background

In the framework of network sampling, random walk (RW) based estimation techniques provide many pragmatic solutions while uncovering the unknown network as little as possible. Despite several theoretical advances in this area, RW based sampling techniques usually make a strong assumption that the samples are in stationary regime, and hence are impelled to leave out the samples collected during the burn-in period.

### Methods

This work proposes two sampling schemes without burn-in time constraint to estimate the average of an arbitrary function defined on the network nodes, for example, the average age of users in a social network. The central idea of the algorithms lies in exploiting regeneration of RWs at revisits to an aggregated super-node or to a set of nodes, and in strategies to enhance the frequency of such regenerations either by contracting the graph or by making the hitting set larger. Our first algorithm, which is based on reinforcement learning (RL), uses stochastic approximation to derive an estimator. This method can be seen as intermediate between purely stochastic Markov chain Monte Carlo iterations and deterministic relative value iterations. The second algorithm, which we call the Ratio with Tours (RT)-estimator, is a modified form of respondent-driven sampling (RDS) that accommodates the idea of regeneration.

### Results

We study the methods via simulations on real networks. We observe that the trajectories of RL-estimator are much more stable than those of standard random walk based estimation procedures, and its error performance is comparable to that of respondent-driven sampling (RDS) which has a smaller asymptotic variance than many other estimators. Simulation studies also show that the mean squared error of RT-estimator decays much faster than that of RDS with time.

### Conclusion

The newly developed RW based estimators (RL- and RT-estimators) allow to avoid burn-in period, provide better control of stability along the sample path, and overall reduce the estimation time. Our estimators can be applied in social and complex networks.

## Keywords

- Network sampling
- Random walks on graph
- Reinforcement learning
- Respondent-driven sampling

## Background

The prohibitive sizes of most social networks make graph-processing that requires complete knowledge of the graph impractical. For instance, social networks like Facebook or Twitter have billions of edges and nodes. In such a situation, we address the problem of estimating global properties of a large network to some degree of accuracy. Some examples of potentially interesting properties include the size of the support base of a certain political party, the average age of users in an online social network (OSN), the proportion of male–female connections with respect to the number of female–female connections in an OSN, and many others. Naturally, since graphs can be used to represent data in myriad of disciplines and scenarios, the questions we can ask are endless.

Graph sampling is a possible solution to address the above problem. To collect information from an OSN, the sampler issues an Application Programming Interface (API) query for a particular user, which returns its one-hop neighborhood and the contents published. Though some OSNs (for instance Twitter) allow access to the complete database with additional expense, we focus here on the typical case when a sampler can get information only about the neighbors of a particular user by means of API queries. There are several ways to collect representative samples in a network. One straightforward way is to collect independent samples via uniform node or edge sampling. However, uniform sampling is not efficient because we do not know the user ID space beforehand. Consequently, the sampler wastes many samples issuing invalid IDs resulting in an inefficient and costly data collection method. Moreover, OSNs typically impose rate limitations on the API queries, e.g., Twitter with 313 million active users enforces a limit of 15 on requests in a 15-min time window for most of APIs.^{1} With this limitation, the crawler will need 610 years to crawl the whole Twitter. Therefore, if only a standard API is available to us, we inevitably need to use some sampling technique, and RW based techniques appear as a good option.

### Important notation and problem formulation

Let \(G=(V,E)\) be an undirected labeled network, where *V* is the set of vertices and \(E \subseteq V \times V\) is the set of edges. Although the graph is undirected, in later use it would be more convenient to represent edges by ordered pairs (*u*, *v*). Of course, if \((u,v) \in E,\) it holds that \((v,u) \in E,\) since *G* is undirected. With a slight abuse of notation, the total number of undirected edges \(|E |/2 \) is denoted as \({|E |}.\)

*n*samples using the scheme XY. We will occasionally drop

*n*and the scheme if it is clear from the context.

A *simple RW* (SRW) on a graph offers a viable solution to this problem that respects the above constraints. From an initial node, a simple RW proceeds by choosing one of the neighbors uniformly randomly and repeating the same process at the next node and so on. In general, a RW need not sample the neighbors uniformly and can take any transition probability compliant with the underlying graph, an example being the Metropolis–Hastings schemes [1]. Random walk techniques are well known (see for instance [2–11] and references therein). A drawback of random walk techniques is that they all suffer from the problem of initial burn-in, i.e., a number of initial samples need to be discarded to get samples from a desired probability distribution. The burn-in period (or mixing time) of a random walk is the time period after which the RW produces almost stationary samples irrespective of the initial distribution. This poses serious limitations, especially in view of the stringent constraints on the number of samples imposed by API query rates. In addition, subsequent samples of a RW are obviously not independent. To get independent samples, it is customary to drop intermediate samples. *In this work, we focus on RW based algorithms that bypass this burn-in time barrier*. We focus on two approaches: reinforcement learning and tour-based ratio estimator.

### Related work and contributions

Many random walk based sampling techniques have been introduced and studied in detail recently. The estimation techniques in [2–5] avoid the burn-in time drawback of random walks, similar to our aim. The works [2–4] are based on the idea of a random walk tour, which is a sample path of a random walk starting and ending at a fixed node. Massoulié et al. [3] estimate the size of a network based on the return times of RW tours. Cooper et al. [2] estimate the number of triangles, network size, and subgraph counts from weighted random walk tours using the results of Aldous and Fill [12, Chapters 2 and 3]. The work in [4] extends these results to edge functions, provides real-time Bayesian guarantees for the performance of the estimator, and introduced some hypothesis tests using the estimator. Instead of the estimators for the sum function of the form \(\sum _{u \in V} g(u)\) proposed in these previous works; here we study the average function (1). Walk-estimate proposed in [5] aimed to reduce the overhead of burn-in period by considering short random walks and then using acceptance–rejection sampling to adjust the sampling probability of a node with respect to its stationary distribution. This work requires an estimate of probability of hitting a node at time *t*, which introduces a computational overhead. It also needs an estimate of the graph diameter to work correctly. Our algorithms are completely local and do not require these global inputs.

There are also specific random walk methods tailored for certain forms of function *g*(*v*) or criterion, for instance, in [10] the authors developed an efficient estimation technique for estimating the average degree, and Frontier sampling in [11] introduced dependent multiple random walks in order to reduce estimation error.

Two well-known techniques for estimating network averages \(\nu (G)\) are the Metropolis–Hastings MCMC (MH-MCMC) scheme [1, 9, 13, 14] and Respondent-Driven sampling (RDS) [6–8].

In our work, we first present a theoretical comparison of the mean-squared error of MH-MCMC and RDS estimators. It was observed that in many practical cases RDS outperforms MH-MCMC in terms of asymptotic error. We confirm this observation here using theoretical expressions for the asymptotic mean-squared errors of the two estimators. Then, we introduce a novel estimator for the network average based on reinforcement learning (RL). By way of simulations on real networks, we demonstrate that, with a good choice of cooling schedule, RL can achieve similar asymptotic error performance to RDS but its trajectories have smaller fluctuations.

Finally, we extend RDS to accommodate the idea of regeneration during revisits to a node or to a ‘super-node,’ formed by aggregating several nodes, and propose the RT-estimator, which does not suffer from burn-in period constraints and significantly outperforms the RDS estimator.

### Notational conventions

Expectation w.r.t. the initial distribution \(\eta \) of a RW is denoted by \(\mathbb E_{\eta }\), and if the distribution degenerates at a particular node *j*, the expectation is \( \mathbb {E} _j\). Matrices in \(\mathbb R^{n \times n}\) are denoted by boldface uppercase letters, e.g., \(\varvec{A}\), and vectors in \(\mathbb R^{n \times 1}\) are denoted by lowercase boldface letters, e.g., \(\varvec{x}\), whereas their respective elements are denoted by non-bold letters, e.g., \(A_{ij},x_i.\) Convergence in distribution is denoted by \(\xrightarrow {D}.\) By \(\mathcal L(X)\) we mean the law or the probability distribution of a random variable. We use \(\mathcal {N}(\mu ,\sigma ^2)\) to denote a Gaussian random variable with mean \(\mu \) and variance \(\sigma ^2.\) Let us define the fundamental matrix of a Markov chain as \(\varvec{Z} := (\varvec{I} - \varvec{P} + \mathbf{1}{\mathbf {\pi }}^{\intercal })^{-1}.\) For two functions \(f,g: \mathcal {V}\rightarrow \mathbb {R},\) we define \(\sigma ^2_{ff} := 2 \langle \varvec{f}, \varvec{Z} \varvec{f} \rangle _{\pi } - \langle f,f \rangle _{\pi } - \langle f,\mathbf{1}{\mathbf {\pi }}^{\intercal }f \rangle _{\pi },\) and \(\sigma ^2_{fg} := \langle \varvec{f}, \varvec{Z} \varvec{g} \rangle _{\pi } + \langle \varvec{g}, \varvec{Z} \varvec{f} \rangle _{\pi } - \langle f,g \rangle _{\pi } - \langle f,\mathbf{1}{\mathbf {\pi }}^{\intercal } g \rangle _{\pi },\) where \(\langle \varvec{x},\varvec{y} \rangle _{\pi } := \sum _{i} x_i y_i \pi _i,\) for any two vectors \(\varvec{x},\varvec{y} \in \mathbb {R}^{|\mathcal {V}| \times 1}, \pi \) being the stationary distribution of the Markov chain.

### Organization

The rest of the paper is organized as follows: In “MH-MCMC and RDS estimators” section, we discuss MH-MCMC as well as RDS providing both known and new material on the asymptotic variance of the estimators. “Network sampling with reinforcement learning (RL-technique)” section introduces the reinforcement learning based technique for sampling and estimation. It also details a modification for an easy implementation with SRW. Then, in “Ratio with tours estimator (RT-estimator)” section, we introduce a modification of the RDS based on the ideas of super-nodes and tours. This modification also does not have a burn-in period and significantly outperforms the plain RDS. “Numerical results” section contains numerical simulations performed on real-world networks and makes several observations about the new algorithms. We conclude the article with “Conclusions” section.

## MH-MCMC and RDS estimators

The utility of RW based methods comes from the fact that for any initial distribution \(\nu,\) as time progresses, the sample distribution of the RW at time *t* starts to resemble a fixed distribution, which we call the stationary distribution of the RW, denoted by \(\pi .\)

We will study mean squared error and asymptotic variance of random walk based estimators in this paper. For this purpose, following extension of the central limit theorem for Markov chains plays a significant role:

### Theorem 1

*Let f be a real-valued function*\(f: V \mapsto \mathbb {R}\)

*with*\( \mathbb {E} _{\pi }[f^2(X_0)] < \infty \)

*. For a finite irreducible Markov chain*\(\{X_n\}\)

*with stationary distribution*\(\pi,\)

*irrespective of the initial distribution, where*

Note that, the above also holds for finite periodic chains (with the existence of unique solution to \({\mathbf {\pi }}^{\intercal } \varvec{P} = {\mathbf {\pi }}^{\intercal }\)).

By [13, Theorem 6.5] \(\sigma ^2_f\) in Theorem 1 is the same as \(\sigma ^2_{ff}.\) We will also need the following theorem.

### Theorem 2

*If f*

*, g are two functions defined on the states of a random walk, define the vector sequence*\(\mathbf {Z}_k = \left[ \begin{matrix} f(X_k) \\ g(X_k) \end{matrix} \right] \)

*the following central limit theorem holds*

*where*\(\mathbf {\Sigma } \)

*is*\(2 \times 2\)

*matrix such that*\(\mathbf {\Sigma }_{11} = \sigma ^2_{ff},\mathbf {\Sigma }_{22} = \sigma ^2_{gg}\)

*and*\(\mathbf {\Sigma }_{12} = \mathbf {\Sigma }_{21}= \sigma ^2_{fg}.\)

*mixing time*defined as

### Function average from RWs

The SRW is biased towards higher degree nodes and by Theorem 1, the sample averages converge to the stationary average. Hence if the aim is to estimate an average function (1), the RW needs to have uniform stationary distribution. Alternatively, the RW should be able to unbias it locally. In order to obtain the average, we modify the function *g* by normalizing it by the vertex degrees to get \(g'(u) = g(u)/\mathbf {\pi }_{u},\) where \(\pi _u = d_u/(2|E|).\) Since \(\mathbf {\pi }(u)\) contains \({ |E |}\) and the knowledge of \({ |E |}\) is not available to us initially, it also needs to be estimated. To overcome this problem, we consider the following modifications of the SRW-based estimator.

### Metropolis–Hastings random walk

*i*, it chooses the next state

*j*according to transition probability \(p_{ij}\). It then jumps to this state with probability \(q_{ij}\) or remains in the current state

*i*with probability \(1 - q_{ij},\) where \(q_{ij}\) is given as below:

*i*to state

*j*is \(q_{ij}p_{ij},\) when \(i \ne j.\) It follows then that such a process represents a Markov chain with the following transition matrix \({\varvec{P}}^{\text {MH}}\)

### Proposition 1

*For MCMC with uniform target stationary distribution it holds that*

*as*\(n \rightarrow \infty ,\)

*where*\(\sigma ^2_{\mathrm{MH}} = \sigma ^2_{gg} =\frac{2}{|V|} {\mathbf {g}}^{\intercal } \varvec{Z}^{\mathrm{MH}} \mathbf {g} - \frac{1}{|V|} {\mathbf {g}}^{\intercal } \mathbf {g} - \left( \frac{1}{|V|} {\mathbf {g}}^{\intercal } \mathbf{1}\right) ^2\)

*and*\(\varvec{Z}^{\mathrm{MH}} = (\varvec{I} - \varvec{P}^{\mathrm{MH}} + \frac{1}{|V|}\mathbf{1}\mathbf{1}^{\intercal })^{-1}.\)

### Respondent-driven sampling technique (RDS-technique)

*E*|.

The asymptotic unbiasedness derives from the Ergodic Theorem and also as a consequence of the CLT given below.

Now we have the following CLT for the RDS Estimator.

### Proposition 2

*The RDS estimator*\(\widehat{\nu }^{(n)}_{\mathrm{RDS}}(G)\)

*satisfies a central limit theorem given below*

*where*\(\sigma ^2_{\mathrm{RDS}}\)

*is given by*

*where*\(\sigma _1^2 = \frac{1}{|E|}\sum _{i,j \in V} g_i Z_{ij}g_j/d_j - \frac{1}{2|E|}\sum _{i\in V} \frac{g_i}{d_i} - (\frac{1}{2|E|} \sum _{i \in V}g_i)^2, \sigma _2^2 = \frac{1}{|E|} \sum _{i,j \in V} Z_{ij}/d_j - \frac{1}{2|E|}\sum _i \frac{1}{d_i} - (\frac{1}{d_{\mathrm{av}}})^2 ,\sigma _{12}^2 = \frac{1}{2|E|}\sum _{i,j\in V}g_i Z_{ij}/d_j + \frac{1}{2|E|} \sum _{i,j \in V} Z_{ij}/d_i - \frac{1}{2|E|d_{av} }\sum _i g_i. \)

###
*Proof*

### Comparing random walk techniques

## Network sampling with reinforcement learning (RL-technique)

We will now introduce a reinforcement learning approach based on stochastic approximation to estimate \(\nu (G)\). The underlying idea relies on the idea of tours and regeneration introduced in [2–4]. We will compare the mean squared error of the new estimator with that of MH-MCMC and RDS, and see how the stability of the sample paths can be controlled.

### Estimator

Let \(V_0 \subset V\) with \(|V_0|<< |V|\). We assume that the nodes inside \(V_0\) are known beforehand. Consider a simple random walk \(\{X_n\}\) on *G* with transition probabilities \(p(j|i) = 1/d(i)\) if \((i,j) \in E\) and zero otherwise. A random walk tour is defined as the sequence of nodes visited by the random walk during successive return to the set \(V_0\). Let \(\tau _n :=\) successive times to visit \(V_0\) and let \(\xi _k : = \tau _k-\tau _{k-1}\). We denote the nodes visited in the *k*th tour as \(X_1^{(k)}, X_2^{(k)},\ldots , X_{\xi _k}^{(k)}\). Note that considering \(V_0\) helps to tackle a disconnected graph^{2} with RW theory and makes tours shorter. Moreover, the tours are independent of each other and can have *massively parallel implementation*. The estimators derived below and later in “Ratio with tours estimator (RT-estimator)” section exploit the independence of the tours and the result that expected sum of functions of nodes visited in a tour is proportional to \(\sum _{u \in V} g(u)\) [4, Lemma 3].

*g*, \( \mathbb {E} _{\pi }[g(X_1)]\) [18, Theorem 7.6]. In the following, we provide numerical ways to solve (6). This could be achieved using the classical MDP methods like relative value iteration; instead we look for solutions from reinforcement learning in which the knowledge of transition probability \([p_Y(j|i)]\) is not needed. Stochastic approximation provides a simple and easily tunable solution as follows. The relative value iteration algorithm to solve (6) is

*A*. Also, \(i_0\) is a prescribed element of \(V_0\). One can use other normalizations in place of \(\mathcal {V}_n(i_0)\), such as \(\frac{1}{|V_0|}\sum _j \mathcal {V}_n(j)\) or \(\min _i \mathcal {V}_n(i)\), see e.g., [20]. Then this normalizing term (\(\mathcal {V}_n(i_0)\) in (8)) converges to \(\beta ^*\), \( \mathbb {E} _{\pi }[g(X_1)]\), as

*n*increases to \(\infty .\)

Taking expectations on both sides of (8), we obtain a deterministic iteration that can be viewed as an incremental version of the relative value iteration (7) with suitably scaled stepsize \(\tilde{a}(n) := \frac{a(n)}{|V|}\). This can be analyzed the same way as the stochastic approximation scheme with the same o.d.e. limit and therefore the same (deterministic) asymptotic limit. This establishes the asymptotic unbiasedness of the RL estimator.

The normalizing term used in (8) (\(\mathcal {V}_n(i_0)\), \(\frac{1}{|V_0|}\sum _j \mathcal {V}_n(j)\) or \(\min _i \mathcal {V}_n(i)\)), along with the underlying random walk as the Metropolis–Hastings, forms our estimator \(\widehat{\nu }_{\text {RL}}(G)\) in RL-based approach. The iteration in (8) is the stochastic approximation analog of it which replaces conditional expectation w.r.t. transition probabilities with an actual sample and then makes an incremental correction based on it, with a slowly decreasing stepsize that ensures averaging. The latter is a standard aspect of stochastic approximation theory. The smaller the stepsize, the less the fluctuations but slower the speed; thus, there is a trade-off between the two.

RL methods can be thought of as a cross between a pure deterministic iteration such as the relative value iteration above and pure MCMC, trading off variance against per iterate computation. The gain is significant if the number of neighbors of a node is much smaller than the number of nodes, because we are essentially replacing averaging over the latter by averaging over neighbors. The \(\mathcal {V}\)-dependent terms can be thought of as control variates to reduce variance.

### Extension of RL-technique to uniform stationary average case

*p*can belong to any random walk having uniform stationary distribution such that \(q(\cdot |\cdot ) >0 \) whenever \(p(\cdot |\cdot ) >0\). One example is to use

*p*as the transition probability of Metropolis–Hastings algorithm with target stationary distribution as uniform and

*q*as the transition probability of a lazy version of simple random walk, i.e., with transition probability matrix \((\varvec{I}+\varvec{P}_{\scriptscriptstyle \text {SRW}})/2\). In comparison with basic Metropolis–Hastings sampling, such importance sampling avoids the API requests for probing the degree of all the neighboring nodes, instead requires only one such, viz., that of the sampled node. Note that the self-loops wherein the chain re-visits a node immediately are not wasted transitions, because it amounts to re-application of a map to the earlier iterate which is distinct from its single application.

The reinforcement learning scheme introduced above is the semi-Markov version of the scheme proposed in [20] and [21].

### Advantages

- 1.
It does not need to wait until burn-in time to collect samples;

- 2.
Comparison with [4]: The super-node in [4] is a single node, an amalgamation of the node set \(V_0\). But such a direction assumes that the contribution of all the edges inside the induced subgraph of \(V_0\) to \({\nu }(G)\) completely known. It could have been avoided if we could make use of the techniques for partitioning state space of a Markov chain (called

*lumpability*in [22]). The conditions stated in [22, Theorem 6.3.2] are not satisfied here and hence we can not invoke such techniques. But the RL-technique, without using the*lumpability*arguments, need not know the edge functions of the subgraph induced by \(V_0\); - 3.
RL-technique along with the extension in “Extension of RL-technique to uniform stationary average case” section can further be extended to the directed graph case provided the graph is strongly connected. On the other hand, for estimators from other RW based sampling schemes, the estimator requires knowledge of the stationary distribution to unbias and thus to form the estimator. But in many cases including SRW on directed graphs, the stationary distribution does not have a closed form expression unlike in undirected case, and this poses a big challenge for design of simple random walk based estimators;

- 4.
As explained before, the main advantage of RL-estimator is its ability to control the stability of sample paths and its position as a cross between deterministic and MCMC iteration. We will see more about this in the numerical section.

## Ratio with tours estimator (RT-estimator)

*B*, the RT-estimator is given by

*m*(

*B*) is the number of tours until the budget

*B*,

This estimator is very close to RDS sampling, explained in “Respondent-driven sampling technique (RDS-technique)” section, except that we miss \(B-\sum _{k=1}^{m(B)} \xi _k\) samples for the estimation purpose and we use super-node to shorten tours. Namely, we can leverage all the advantages of super-node mentioned in [4, Section 2] and we claim that this would highly improve the performance. We show this via numerical simulations in the next section, and theoretical properties will be studied in the future.

Note that the formation of super-node is different from \(V_0\) considered in the RL-technique, where the RW tours can start from any uniformly selected node inside \(V_0\) and the tours end when it hit the set \(V_0\). On the other hand, the super-node which is formed from *n* nodes in *V* is considered as a single node (removing all the edges in between the nodes in \(S_n\)) and this contracts the original graph *G*. Both the formulations have advantages of their own: Super-node and its estimator is easy to form and compute, but one needs to know all the edges between the nodes in \(S_n\), i.e., the induced subgraph from \(S_n\) should be known a priori. The set \(V_0\) in RL-technique does not demand this.

## Numerical results

*x*-axis represents the budget

*B*which is the number of allowed samples, and is the same for all the techniques. We use the normalized root mean squared error (NRMSE) for comparison for a given

*B*and is defined as

### Datasets

We use the following datasets. All the datasets are publicly available.^{3}

###
*Les Misérables network*

In Les Misérables network, nodes are the characters of the novel and edges are formed if two characters appear in the same chapter in the novel. The number of nodes is 77 and number of edges is 254. We have chosen this rather small network in order to compare the methods in terms of theoretical limiting variance (which is difficult to compute for large networks).

###
*Friendster network*

We consider a larger graph here, a connected subgraph of an online social network called Friendster with \(64,\!600\) nodes and \(1,\!246,\!479\) edges. The nodes in Friendster are individuals and edges indicate friendship.

### Enron email network

The nodes in this network are the email ID’s of the employees in Enron and the edges are formed when two employees communicated through email. We take the largest connected component with 33, 696 nodes and 180, 811 edges.

### Choice of demonstrative functions

*A*, and d) for calculating \({\nu }(G)\) as the average clustering coefficient

*t*(

*v*) as the number of triangles that contain node

*v*. Then

*g*(

*v*) is taken as

*c*(

*v*) itself. In the Friendster network, we consider the functions a) \(g(v)=\mathbb {I}\{d(v)>50 \}\) and b) \(g(v) = c(v)\) [see (10)] used to estimate the average clustering coefficient.

In case of the Enron network, we aim to estimate the proportion of number of elements in a community, assuming each node knows which community it associates with it. We look for community with index 1 with 6, 225 nodes, i.e., \(g(v) = \mathbb {I}\{ \text {Comm}(v) = 1 \}\), with \(\text {Comm}(v)\) denote the community node *v* part of. We expect such a function will have an impact on the performance of RDS and MH-MCMC. The communities are recovered using the algorithm described in [23]. We also consider \(g(v)=d(v)\) for the Enron network.

### NRMSE comparison of RDS, MHMC, and RL-technique

For the RL-technique, we choose the initial set \(V_0\) by uniformly sampling nodes assuming the size of \(V_0\) is given a priori.

The average in MSE is calculated from multiple runs of the simulations. The simulations on Les Misérables network are shown in Fig. 1 with \(a(n)=1/\lceil \frac{n}{10} \rceil \) and \(|V_0|\) as 25. The plot in Fig. 2 shows the results for Friendster graph with \(|V_0| = 1000\). Here the sequence *a*(*n*) is taken as \(1/\lceil \frac{n}{25} \rceil \). Figure 3 presents the results in Enron network using \(a(n) = 1/\lceil \frac{n}{20} \rceil \), and \(V_0\) with 1000 nodes randomly selected.

### Study of asymptotic variance of RDS, MHMC, and RL-technique

The mean squared error \(\text {MSE} = \text {Var}[\widehat{{\nu }}^{(n)}(G)]+ {\Big ( \mathbb {E} [\widehat{{\nu }}^{(n)}(G)]-{\nu }(G) \Big )}^2\). Here we study the asymptotic variance \(\sigma ^2_g\) [see (2)] of the estimators in terms of \(n \times \text {MSE}\), since the bias \( | \mathbb {E} [\widehat{{\nu }}^{(n)}(G)]-{\nu }(G) |\rightarrow 0 \) as \(n \rightarrow \infty \).

### Stability of RL-technique

*single*sample path properties of the algorithms. Hence, the numerator of NRMSE becomes absolute error. Figure 5a shows the effect of increasing initial set \(V_0\) size while fixing step size

*a*(

*n*) and Fig. 5b shows the effect of changing

*a*(

*n*) when \(V_0\) is fixed. In both the cases, the green curve of RL-technique shows much stability compared to the other techniques.

### Designing tips for the RL-technique

- 1.
With respect to the limiting variance, RDS always outperforms the other two methods tested. However, with a good choice of parameters the performance of RL is not far from that of RDS;

- 2.
In the RL-technique, we find that the normalizing term \(1/|V_0| \sum _j \mathcal {V}_n(j)\) converges much faster than the other two options, \(\mathcal {V}_t(i_0)\) and \(\min _i \mathcal {V}_t(i)\);

- 3.
When the size of the super-node decreases, the RL-technique requires smaller step size

*a*(*n*). For instance in case of Les Misérables network, if the initial set \(V_0\) size is less than 10, RL-technique does not converge with \(a(n)=1/(\lceil \frac{n}{50} \rceil +1)\) and requires \(a(n)=1/(\lceil \frac{n}{5} \rceil )\); - 4.
If step size

*a*(*n*) decreases or the super-node size increases, RL fluctuates less but with slower convergence. In general, RL has less fluctuations than MH-MCMC or RDS.

### NRMSE comparison of RDS and RT-estimator

Here, we compare RDS and RT estimators. The choice of RDS for comparison is motivated by the results shown in the previous section that it outperforms other sampling schemes considered in this paper so far, in terms of asymptotic variance and mean squared error. Moreover, RT-estimator can be regarded as a natural modification of RDS making use of the ideas of tours and super-node.

## Conclusions

We addressed a critical issue in the RW based sampling methods on graphs: the burn-in period. Our ideas are based on exploiting the tours (regenerations) and on the best use of the given seed nodes by making only short tours. These short tours or crawls, which start and return to the seed node set, are independent and can be implemented in a massively parallel way. The idea of regeneration allows us to construct estimators that are not marred by the burn-in requirement. We proposed two estimators based on this general idea. The first, the RL-estimator, uses reinforcement learning and stochastic approximation to build a stable estimator by observing random walks returning to the seed set. We then proposed the RT-estimator, which is a modification of the classical respondent-driven sampling, making use of the idea of short crawls and super-node. These two schemes have advantages of their own: the reinforcement learning scheme offers more control on the stability of the sample path with varying error performance, and the modified RDS scheme based on short crawls is simple and has superior performance compared to the classical RDS.

In the future, our aim is to study deeply the theoretical performance of our algorithms. We have also left open the selection process for the initial seed set or the super-node, and this also suggests an interesting research topic to explore in the future.

The underlying Markov chain of the RW requires to be irreducible in order to apply many results of the RWs and this is satisfied when the graph is connected. In case of a disconnected graph, taking at least one seed node from each of the components to form \(V_0\) helps to achieve this.

See public repositories SNAP (https://snap.stanford.edu/data/) and KONECT (http://konect.uni-koblenz.de/).

## Declarations

### Authors’ contributions

KA and JS contributed to all parts of the paper. In particular, they have done most writing. KA and JS also designed the experiments but JS wrote the code for the experiments and carried out the experiments. KA contributed RT-estimator. VB contributed RL-estimator and AK contributed Proposition 2. All authors read and approved the final manuscript.

### Competing interests

The authors declare that they have no competing interests.

### Availability of data and materials

### Consent for publication

Not applicable.

### Ethics approval and consent to participate

Not applicable.

### Funding

This work was supported by CEFIPRA Grants 5100-IT1 and IFC/DST-Inria-2016-01/448, Inria Nokia Bell Labs ADR “Network Science,” and by the French Government (National Research Agency, ANR) through the “Investments for the Future” Program Reference ANR-11-LABX-0031-01.

### Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

## Authors’ Affiliations

## References

- Robert C, Casella G. Monte Carlo statistical methods. New York: Springer; 2013.MATHGoogle Scholar
- Cooper C, Radzik T, Siantos Y. Fast low-cost estimation of network properties using random walks. In: Proceedings of workshop on algorithms and models for the web-graph (WAW), Cambridge. 2013.Google Scholar
- Massoulié L, Le Merrer E, Kermarrec A-M, Ganesh A. Peer counting and sampling in overlay networks: Random walk methods. In: Proceedings of ACM annual symposium on principles of distributed computing (PODC), Denver. 2006.Google Scholar
- Avrachenkov K, Ribeiro B, Sreedharan JK. Inference in osns via lightweight partial crawls. ACM SIGMETRICS Perform Eval Rev. 2016;44(1):165–77.View ArticleGoogle Scholar
- Nazi A, Zhou Z, Thirumuruganathan S, Zhang N, Das G. Walk, not wait: faster sampling over online social networks. In: Proceedings of the VLDB Endowment, vol. 8, no. 6. 2015. p. 678–89Google Scholar
- Goel S, Salganik MJ. Respondent-driven sampling as markov chain monte carlo. Stat Med. 2009;28(17):2202–29.MathSciNetView ArticleGoogle Scholar
- Salganik MJ, Heckathorn DD. Sampling and estimation in hidden populations using respondent-driven sampling. Sociol Methodol. 2004;34(1):193–240.View ArticleGoogle Scholar
- Volz E, Heckathorn DD. Probability based estimation theory for respondent driven sampling. J Off Stat. 2008;24(1):79.Google Scholar
- Gjoka M, Kurant M, Butts CT, Markopoulou A. Walking in facebook: a case study of unbiased sampling of osns. In: Proceedings of IEEE INFOCOM. 2010. p. 1–9.Google Scholar
- Dasgupta A, Kumar R, Sarlos T. On estimating the average degree. In: Proceedings of WWW. 2014. p. 795–806.Google Scholar
- Ribeiro B, Towsley D. Estimating and sampling graphs with multidimensional random walks. In: Proceedings of ACM SIGCOMM internet measurement conference (IMC), Melbourne, Australia. 2010.Google Scholar
- Aldous D, Fill JA. Reversible markov chains and random walks on graphs. Unfinished monograph, recompiled 2014. http://www.stat.berkeley.edu/~aldous/RWG/book.html. 2002.
- Brémaud P. Markov chains: gibbs fields, monte carlo simulation, and queues. New York: Springer; 1999.View ArticleMATHGoogle Scholar
- Nummelin E. MC’s for MCMC’ists. Int Stat Rev. 2002;70(2):215–40.View ArticleMATHGoogle Scholar
- Roberts GO, Rosenthal JS. General state space markov chains and mcmc algorithms. Probability Surv. 2004;1:20–71.MathSciNetView ArticleMATHGoogle Scholar
- Billingsley P. Probability and measure. 3rd ed. New Jersey: Wiley; 2008.MATHGoogle Scholar
- Lee C-H, Xu X, Eun DY. Beyond random walk and Metropolis–Hastings samplers: why you should not backtrack for unbiased graph sampling. In: Proceedings ACM SIGMETRICS/PERFORMANCE joint international conference on measurement and modeling of computer systems, London. 2012.Google Scholar
- Ross SM. Applied probability models with optimization applications. New York: Dover Publications; 1992.MATHGoogle Scholar
- Abounadi J, Bertsekas D, Borkar VS. Learning algorithms for markov decision processes with average cost. SIAM J Control Optimization. 2001;40(3):681–98.MathSciNetView ArticleMATHGoogle Scholar
- Borkar VS, Makhijani R, Sundaresan R. Asynchronous gossip for averaging and spectral ranking. IEEE J Sel Areas Commun. 2014;8(4):703–16.Google Scholar
- Borkar VS. Reinforcement learning: a bridge between numerical methods and monte carlo. In: Perspectives in mathematical science-I: probability and statistics. 2009. p. 71–91.Google Scholar
- Kemeny JG, Snell JL. Finite Markov Chains. New York: Springer; 1983.MATHGoogle Scholar
- Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. J Stat Mech Theory Exp. 2008;2008(10):10008.View ArticleGoogle Scholar