Revisiting random walk based sampling in networks: evasion of burn-in period and frequent regenerations

Background In the framework of network sampling, random walk (RW) based estimation techniques provide many pragmatic solutions while uncovering the unknown network as little as possible. Despite several theoretical advances in this area, RW based sampling techniques usually make a strong assumption that the samples are in stationary regime, and hence are impelled to leave out the samples collected during the burn-in period. Methods This work proposes two sampling schemes without burn-in time constraint to estimate the average of an arbitrary function defined on the network nodes, for example, the average age of users in a social network. The central idea of the algorithms lies in exploiting regeneration of RWs at revisits to an aggregated super-node or to a set of nodes, and in strategies to enhance the frequency of such regenerations either by contracting the graph or by making the hitting set larger. Our first algorithm, which is based on reinforcement learning (RL), uses stochastic approximation to derive an estimator. This method can be seen as intermediate between purely stochastic Markov chain Monte Carlo iterations and deterministic relative value iterations. The second algorithm, which we call the Ratio with Tours (RT)-estimator, is a modified form of respondent-driven sampling (RDS) that accommodates the idea of regeneration. Results We study the methods via simulations on real networks. We observe that the trajectories of RL-estimator are much more stable than those of standard random walk based estimation procedures, and its error performance is comparable to that of respondent-driven sampling (RDS) which has a smaller asymptotic variance than many other estimators. Simulation studies also show that the mean squared error of RT-estimator decays much faster than that of RDS with time. Conclusion The newly developed RW based estimators (RL- and RT-estimators) allow to avoid burn-in period, provide better control of stability along the sample path, and overall reduce the estimation time. Our estimators can be applied in social and complex networks.


Background
The prohibitive sizes of most social networks make graph-processing that requires complete knowledge of the graph impractical. For instance, social networks like Facebook or Twitter have billions of edges and nodes. In such a situation, we address the problem of estimating global properties of a large network to some degree of accuracy. Some examples of potentially interesting properties include the size of the support base of a certain political party, the average age of users in an online social network (OSN), the proportion of male-female connections with respect to the number of female-female connections in an OSN, and many others. Naturally, since graphs can be used to represent data in myriad of disciplines and scenarios, the questions we can ask are endless.
Graph sampling is a possible solution to address the above problem. To collect information from an OSN, the sampler issues an Application Programming Interface (API) query for a particular user, which returns its one-hop neighborhood and the contents published. Though some OSNs (for instance Twitter) allow access to the complete database with additional expense, we focus here on the typical case when a sampler can get information only about the neighbors of a particular user by means of API queries. There are several ways to collect representative samples in a network. One straightforward way is to collect independent samples via uniform node or edge sampling. However, uniform sampling is not efficient because we do not know the user ID space beforehand. Consequently, the sampler wastes many samples issuing invalid IDs resulting in an inefficient and costly data collection method. Moreover, OSNs typically impose rate limitations on the API queries, e.g., Twitter with 313 million active users enforces a limit of 15 on requests in a 15-min time window for most of APIs. 1 With this limitation, the crawler will need 610 years to crawl the whole Twitter. Therefore, if only a standard API is available to us, we inevitably need to use some sampling technique, and RW based techniques appear as a good option.

Important notation and problem formulation
Let G = (V , E) be an undirected labeled network, where V is the set of vertices and E ⊆ V × V is the set of edges. Although the graph is undirected, in later use it would be more convenient to represent edges by ordered pairs (u, v). Of course, if (u, v) ∈ E, it holds that (v, u) ∈ E, since G is undirected. With a slight abuse of notation, the total number of undirected edges |E|/2 is denoted as |E|.
Both edges and nodes can have function values defined on them. For instance, in an OSN, the node function can be the age or number of friends and the edge function can be an indicator function when the terminal nodes of the edge are of same gender. Let us denote by g : V → R, where R is the real number space, a function on the vertices of the graph. We aim to estimate the following network function average: 1 https://dev.twitter.com/rest/public/rate-limits.
The constraint on the estimator is that it does not know the whole graph, and can only issue API requests. Each API request furnishes the function value g(·) at the node queried and the list of its neighbors. Let ν (n) XY (G) be our estimate of ν(G) formed from n samples using the scheme XY. We will occasionally drop n and the scheme if it is clear from the context.
A simple RW (SRW) on a graph offers a viable solution to this problem that respects the above constraints. From an initial node, a simple RW proceeds by choosing one of the neighbors uniformly randomly and repeating the same process at the next node and so on. In general, a RW need not sample the neighbors uniformly and can take any transition probability compliant with the underlying graph, an example being the Metropolis-Hastings schemes [1]. Random walk techniques are well known (see for instance [2][3][4][5][6][7][8][9][10][11] and references therein). A drawback of random walk techniques is that they all suffer from the problem of initial burn-in, i.e., a number of initial samples need to be discarded to get samples from a desired probability distribution. The burn-in period (or mixing time) of a random walk is the time period after which the RW produces almost stationary samples irrespective of the initial distribution. This poses serious limitations, especially in view of the stringent constraints on the number of samples imposed by API query rates. In addition, subsequent samples of a RW are obviously not independent. To get independent samples, it is customary to drop intermediate samples. In this work, we focus on RW based algorithms that bypass this burn-in time barrier. We focus on two approaches: reinforcement learning and tour-based ratio estimator.

Related work and contributions
Many random walk based sampling techniques have been introduced and studied in detail recently. The estimation techniques in [2][3][4][5] avoid the burn-in time drawback of random walks, similar to our aim. The works [2][3][4] are based on the idea of a random walk tour, which is a sample path of a random walk starting and ending at a fixed node. Massoulié et al. [3] estimate the size of a network based on the return times of RW tours. Cooper et al. [2] estimate the number of triangles, network size, and subgraph counts from weighted random walk tours using the results of Aldous and Fill [12,Chapters 2 and 3]. The work in [4] extends these results to edge functions, provides real-time Bayesian guarantees for the performance of the estimator, and introduced some hypothesis tests using the estimator. Instead of the estimators for the sum function of the form u∈V g(u) proposed in these previous works; here we study the average function (1). Walk-estimate proposed in [5] aimed to reduce the overhead of burn-in period by considering short random walks and then using acceptance-rejection sampling to adjust the sampling probability of a node with respect to its stationary distribution. This work requires an estimate of probability of hitting a node at time t, which introduces a computational overhead. It also needs an estimate of the graph diameter to work correctly. Our algorithms are completely local and do not require these global inputs.
There are also specific random walk methods tailored for certain forms of function g(v) or criterion, for instance, in [10] the authors developed an efficient estimation technique for estimating the average degree, and Frontier sampling in [11] introduced dependent multiple random walks in order to reduce estimation error.
In our work, we first present a theoretical comparison of the mean-squared error of MH-MCMC and RDS estimators. It was observed that in many practical cases RDS outperforms MH-MCMC in terms of asymptotic error. We confirm this observation here using theoretical expressions for the asymptotic mean-squared errors of the two estimators. Then, we introduce a novel estimator for the network average based on reinforcement learning (RL). By way of simulations on real networks, we demonstrate that, with a good choice of cooling schedule, RL can achieve similar asymptotic error performance to RDS but its trajectories have smaller fluctuations.
Finally, we extend RDS to accommodate the idea of regeneration during revisits to a node or to a 'super-node, ' formed by aggregating several nodes, and propose the RT-estimator, which does not suffer from burn-in period constraints and significantly outperforms the RDS estimator.

Notational conventions
Expectation w.r.t. the initial distribution η of a RW is denoted by E η , and if the distribution degenerates at a particular node j, the expectation is E j . Matrices in R n×n are denoted by boldface uppercase letters, e.g., A, and vectors in R n×1 are denoted by lowercase boldface letters, e.g., x, whereas their respective elements are denoted by non-bold letters, e.g., A ij , x i . Convergence in distribution is denoted by D − → . By L(X) we mean the law or the probability distribution of a random variable. We use N (µ, σ 2 ) to denote a Gaussian random variable with mean µ and variance σ 2 . Let us define the fundamental matrix of a Markov chain as Z := (I − P + 1π ⊺ ) −1 . For two functions f , g : V → R, we define σ 2 ff := 2�f , Zf � π − �f , f � π − �f , 1π ⊺ f � π , and σ 2 fg := �f , Zg� π + �g, Zf � π − �f , g� π − �f , 1π ⊺ g� π , where �x, y� π := i x i y i π i , for any two vectors x, y ∈ R |V|×1 , π being the stationary distribution of the Markov chain.

Organization
The rest of the paper is organized as follows: In "MH-MCMC and RDS estimators" section, we discuss MH-MCMC as well as RDS providing both known and new material on the asymptotic variance of the estimators. "Network sampling with reinforcement learning (RL-technique)" section introduces the reinforcement learning based technique for sampling and estimation. It also details a modification for an easy implementation with SRW. Then, in "Ratio with tours estimator (RT-estimator)" section, we introduce a modification of the RDS based on the ideas of super-nodes and tours. This modification also does not have a burn-in period and significantly outperforms the plain RDS. "Numerical results" section contains numerical simulations performed on real-world networks and makes several observations about the new algorithms. We conclude the article with "Conclusions" section.

MH-MCMC and RDS estimators
The utility of RW based methods comes from the fact that for any initial distribution ν, as time progresses, the sample distribution of the RW at time t starts to resemble a fixed distribution, which we call the stationary distribution of the RW, denoted by π.
We will study mean squared error and asymptotic variance of random walk based estimators in this paper. For this purpose, following extension of the central limit theorem for Markov chains plays a significant role:

irrespective of the initial distribution, where
Note that, the above also holds for finite periodic chains (with the existence of unique solution to π ⊺ P = π ⊺ ).
By [13,Theorem 6.5] σ 2 f in Theorem 1 is the same as σ 2 ff . We will also need the following theorem.

Theorem 2 ([14], Theorem 3) If f, g are two functions defined on the states of a random walk, define the vector sequence
The time required by a random walk or Markov chain to reach stationarity is measured by a parameter called mixing time defined as where �ξ 1 − ξ 2 � TV := max A⊂V |ξ 1 (A) − ξ 2 (A)| is the total variational distance between the probability distributions ξ 1 and ξ 2 . If the mixing time is known, then as many samples are omitted in any RW based algorithm to ensure that the samples are in stationary regime. Since it is difficult to calculate the mixing time accurately, practitioners often use a prediction called burn-in period which is much larger than the mixing time.

Function average from RWs
The SRW is biased towards higher degree nodes and by Theorem 1, the sample averages converge to the stationary average. Hence if the aim is to estimate an average function (1), the RW needs to have uniform stationary distribution. Alternatively, the RW should be able to unbias it locally. In order to obtain the average, we modify the function g by normalizing it by the vertex degrees to get g ′ (u) = g(u)/π u , where π u = d u /(2|E|). Since π(u) contains |E| and the knowledge of |E| is not available to us initially, it also needs to be estimated. To overcome this problem, we consider the following modifications of the SRW-based estimator.

Metropolis-Hastings random walk
We review here the Metropolis-Hastings MCMC (MH-MCMC) algorithm. When the chain is in state i, it chooses the next state j according to transition probability p ij . It then jumps to this state with probability q ij or remains in the current state i with probability 1 − q ij , where q ij is given as below: Therefore, the effective jump probability from state i to state j is q ij p ij , when i � = j. It follows then that such a process represents a Markov chain with the following transition matrix P MH This chain is reversible with stationary distribution π(i) = 1/n ∀i ∈ V . Therefore, the following estimate for ν(G) using MH-MCMC, {X n } being MH-MCMC samples, is asymptotically consistent.
By using Theorem 1, we can show the following central limit theorem for MH-MCMC.

Respondent-driven sampling technique (RDS-technique)
The estimator with respondent-driven sampling uses the SRW on graphs but applies a correction to the estimator to compensate for the non-uniform stationary distribution, i.e., Note that this estimator does not require |E|. The asymptotic unbiasedness derives from the Ergodic Theorem and also as a consequence of the CLT given below. Now we have the following CLT for the RDS Estimator.

Proposition 2 The RDS estimator ν (n) RDS (G) satisfies a central limit theorem given below
where σ 2 RDS is given by , and let z n = √ n 1 n n t=1 z t − E π (z t ) . Then by Theorem 2, z n D − → Normal(0, ), where is the correlation matrix, whose formula given in Theorem 2. Let z n = ( z 1 n , z 2 n ). Then we have is a term that goes to zero in probability at least as fast as 1 √ n , and µ h nm , µ h dm are, respectively, E π (h nm ) and E π (h dm ). Then by Slutsky's lemma [16]. The result then follows since ( z 1 n , z 2 n ) converges to jointly Gaussian random variable, and by continuous mapping theorem.

Comparing random walk techniques
Random walks can be compared in many ways. Two prominent ways to compare RW estimators are based on their mixing times t mix and their asymptotic variances. Mixing time is relevant in the situations where the speed at which the RW approaches the stationary distribution matters. But many MCMC algorithms discard some initial samples (called burn-in period) to mitigate the dependence on the initial distribution and this amounts to the mixing time. After the burn-in period, the number of samples needed for achieving a certain estimation accuracy can be determined from the Gaussian approximation given by the central limit theorem (see Theorem 1). Hence, another measure for comparison of the random walks is the asymptotic variance in the Gaussian approximation. The lower the asymptotic variance, the smaller the number of samples needed for a certain estimation accuracy. Many authors consider asymptotic unbiasedness as the principal parameter to compare RW based estimators. For instance, the authors in [17] prove that non-backtracking random walks perform better than the SRW and MH-MCMC methods in terms of the asymptotic variance of the estimators. The asymptotic variance can be related to the eigenvalues of P as follows: where �x, y� π = i∈V x i y i π i [13, Chapter 6]. When the interest is in the speed of convergence to equilibrium, then only the second-largest eigenvalue modulus matters. However, if the aim is to compute E π [f (X 0 )] as the ergodic mean lim n→∞ 1 n n k=1 f (X k ) , then all the eigenvalues become significant and this is captured when the quality of the ergodic estimator is measured by the asymptotic variance.

Network sampling with reinforcement learning (RL-technique)
We will now introduce a reinforcement learning approach based on stochastic approximation to estimate ν(G). The underlying idea relies on the idea of tours and regeneration introduced in [2][3][4]. We will compare the mean squared error of the new estimator with that of MH-MCMC and RDS, and see how the stability of the sample paths can be controlled.

Estimator
Let V 0 ⊂ V with |V 0 | << |V |. We assume that the nodes inside V 0 are known beforehand. Consider a simple random walk {X n } on G with transition probabilities p(j|i) = 1/d(i) if (i, j) ∈ E and zero otherwise. A random walk tour is defined as the sequence of nodes visited by the random walk during successive return to the set V 0 . Let τ n := successive times to visit V 0 and let ξ k := τ k − τ k−1 . We denote the nodes visited in the kth tour as X ξ k . Note that considering V 0 helps to tackle a disconnected graph 2 with RW theory and makes tours shorter. Moreover, the tours are independent of each other and can have massively parallel implementation. The estimators derived below and later in "Ratio with tours estimator (RT-estimator)" section exploit the independence of the tours and the result that expected sum of functions of nodes visited in a tour is proportional to u∈V g(u) [4, Lemma 3].
Define Y n := X τ n . Then {(Y n , τ n )} is a semi-Markov process on V 0 [18, Chapter 5]. In particular, {Y n } is a Markov chain on V 0 with transition probability matrix (say) [p Y (j|i)]. We have ξ 1 := min{n > 0 : X n ∈ V 0 }. For a prescribed g : V � → R, define Consider an average cost Markov decision problem (MDP), then the Poisson equation for the semi-Markov process {(Y n , τ n )} is [18,Chapter 7] which is to be solved for the pair (V, β), where V : V 0 � → R and β ∈ R. Under mild conditions, (6) has the solution (V * , β * ). The optimal β * is the average expected cost stationary average of g, E π [g(X 1 )] [18, Theorem 7.6]. In the following, we provide numerical ways to solve (6). This could be achieved using the classical MDP methods like relative value iteration; instead we look for solutions from reinforcement learning in which the knowledge of transition probability [p Y (j|i)] is not needed. Stochastic approximation provides a simple and easily tunable solution as follows. The relative value iteration algorithm to solve (6) is We can implement this using stochastic approximation as follows: let {Z n , n ≥ 1} be from the stationary distribution of the underlying RW conditioned on being in V 0 . Construct a tour for n ≥ 1 by starting a SRW X (n) i , i ≥ 0, with X (n) 0 = Z n and observing its sample path until it returns to V 0 .
A learning algorithm for (6) along the lines of [19] then is, for i ∈ V 0 , 2 The underlying Markov chain of the RW requires to be irreducible in order to apply many results of the RWs and this is satisfied when the graph is connected. In case of a disconnected graph, taking at least one seed node from each of the components to form V 0 helps to achieve this.
where a(n) > 0 are stepsizes satisfying n a(n) = ∞, n a(n) 2 < ∞. (One good choice is a(n) = 1/⌈ n N ⌉ for N = 50 or 100.) Here I{A} denotes indicator function for the set A. Also, i 0 is a prescribed element of V 0 . One can use other normalizations in place of V n (i 0 ) , such as 1 |V 0 | j V n (j) or min i V n (i), see e.g., [20]. Then this normalizing term (V n (i 0 ) in (8)) converges to β * , E π [g(X 1 )], as n increases to ∞.
Taking expectations on both sides of (8), we obtain a deterministic iteration that can be viewed as an incremental version of the relative value iteration (7) with suitably scaled stepsize ã(n) := a(n) |V | . This can be analyzed the same way as the stochastic approximation scheme with the same o.d.e. limit and therefore the same (deterministic) asymptotic limit. This establishes the asymptotic unbiasedness of the RL estimator.
The normalizing term used in (8) , along with the underlying random walk as the Metropolis-Hastings, forms our estimator ν RL (G) in RLbased approach. The iteration in (8) is the stochastic approximation analog of it which replaces conditional expectation w.r.t. transition probabilities with an actual sample and then makes an incremental correction based on it, with a slowly decreasing stepsize that ensures averaging. The latter is a standard aspect of stochastic approximation theory. The smaller the stepsize, the less the fluctuations but slower the speed; thus, there is a trade-off between the two. RL methods can be thought of as a cross between a pure deterministic iteration such as the relative value iteration above and pure MCMC, trading off variance against per iterate computation. The gain is significant if the number of neighbors of a node is much smaller than the number of nodes, because we are essentially replacing averaging over the latter by averaging over neighbors. The V-dependent terms can be thought of as control variates to reduce variance.

Extension of RL-technique to uniform stationary average case
The stochastic approximation iteration in (8) converges to β, which is E π [g(X 1 )], where π is the stationary distribution of the underlying walk. To make it converge to ν(G), we can use the Metropolis-Hastings random walk with uniform target distribution. However, we can avoid the use of Metropolis-Hastings algorithm by the following modification, motivated from importance sampling that achieves the convergence to ν(G) with the simple random walk (SRW). We propose where Here q(·|·) is the transition probability of the random walk with which we simulate the algorithm and p(·|·) corresponds to the transition probability of the random walk with respect to which we need the stationary average. The transition probability p can . belong to any random walk having uniform stationary distribution such that q(·|·) > 0 whenever p(·|·) > 0. One example is to use p as the transition probability of Metropolis-Hastings algorithm with target stationary distribution as uniform and q as the transition probability of a lazy version of simple random walk, i.e., with transition probability matrix (I + P SRW )/2. In comparison with basic Metropolis-Hastings sampling, such importance sampling avoids the API requests for probing the degree of all the neighboring nodes, instead requires only one such, viz., that of the sampled node. Note that the self-loops wherein the chain re-visits a node immediately are not wasted transitions, because it amounts to re-application of a map to the earlier iterate which is distinct from its single application.
The reinforcement learning scheme introduced above is the semi-Markov version of the scheme proposed in [20] and [21].

Advantages
The RL-technique extends the use of regeneration, tours and super-node introduced in [4] to the average function ν(G). Even though the RL-technique is not non-asymptotically unbiased unlike the algorithm in [4], it has the following advantages: 1. It does not need to wait until burn-in time to collect samples; 2. Comparison with [4]: The super-node in [4] is a single node, an amalgamation of the node set V 0 . But such a direction assumes that the contribution of all the edges inside the induced subgraph of V 0 to ν(G) completely known. It could have been avoided if we could make use of the techniques for partitioning state space of a Markov chain (called lumpability in [22]). The conditions stated in [22,Theorem 6.3.2] are not satisfied here and hence we can not invoke such techniques. But the RL-technique, without using the lumpability arguments, need not know the edge functions of the subgraph induced by V 0 ; 3. RL-technique along with the extension in "Extension of RL-technique to uniform stationary average case" section can further be extended to the directed graph case provided the graph is strongly connected. On the other hand, for estimators from other RW based sampling schemes, the estimator requires knowledge of the stationary distribution to unbias and thus to form the estimator. But in many cases including SRW on directed graphs, the stationary distribution does not have a closed form expression unlike in undirected case, and this poses a big challenge for design of simple random walk based estimators; 4. As explained before, the main advantage of RL-estimator is its ability to control the stability of sample paths and its position as a cross between deterministic and MCMC iteration. We will see more about this in the numerical section.

Ratio with tours estimator (RT-estimator)
In this section, we use of the idea of regeneration and tours introduced in [4] to estimate the average function ν(G). However, since the tour estimator only gives an unbiased estimator for network sums namely i∈V g(i), to find an estimate for ν(G) we use the same samples to get an estimate for |V |. Let I n be the set of initial nodes recruited for forming the super-node [4] and let S n be the single combined node corresponding to I n . We emphasize that while in RL-technique, the set of selected nodes I n stays intact, in the RTestimator case, we shrink all these nodes in one super-node S n . The estimator will compensate for network modification. With a sampling budget B, the RT-estimator is given by where m(B) is the number of tours until the budget B, This estimator is very close to RDS sampling, explained in "Respondent-driven sampling technique (RDS-technique)" section, except that we miss B − m(B) k=1 ξ k samples for the estimation purpose and we use super-node to shorten tours. Namely, we can leverage all the advantages of super-node mentioned in [4, Section 2] and we claim that this would highly improve the performance. We show this via numerical simulations in the next section, and theoretical properties will be studied in the future.
Note that the formation of super-node is different from V 0 considered in the RL-technique, where the RW tours can start from any uniformly selected node inside V 0 and the tours end when it hit the set V 0 . On the other hand, the super-node which is formed from n nodes in V is considered as a single node (removing all the edges in between the nodes in S n ) and this contracts the original graph G. Both the formulations have advantages of their own: Super-node and its estimator is easy to form and compute, but one needs to know all the edges between the nodes in S n , i.e., the induced subgraph from S n should be known a priori. The set V 0 in RL-technique does not demand this.

Numerical results
The algorithms RL-technique, RT-estimator, RDS, and MH-MCMC are compared in this section using simulations on three real-world networks. For the figures in this section, the x-axis represents the budget B which is the number of allowed samples, and is the same for all the techniques. We use the normalized root mean squared error (NRMSE) for comparison for a given B and is defined as

Datasets
We use the following datasets. All the datasets are publicly available. 3

Choice of demonstrative functions
Recall that the network function average to be estimated ν(G) = u∈V g(u)/|V | . In the Les Misérables network, we consider four demonstrative functions: a) where I{A} is the indicator function for set A, and d) for calculating ν(G) as the average clustering coefficient In case of the Enron network, we aim to estimate the proportion of number of elements in a community, assuming each node knows which community it associates with it. We look for community with index 1 with 6, 225 nodes, i.e., g(v) = I{Comm(v) = 1} , with Comm(v) denote the community node v part of. We expect such a function will have an impact on the performance of RDS and MH-MCMC. The communities are recovered using the algorithm described in [23]. We also consider g(v) = d(v) for the Enron network.

NRMSE comparison of RDS, MHMC, and RL-technique
For the RL-technique, we choose the initial set V 0 by uniformly sampling nodes assuming the size of V 0 is given a priori.
The average in MSE is calculated from multiple runs of the simulations. The simulations on Les Misérables network are shown in Fig. 1 with a(n) = 1/⌈ n 10 ⌉ and |V 0 | as 25. The plot in Fig. 2 shows the results for Friendster graph with |V 0 | = 1000. Here the sequence a(n) is taken as 1/⌈ n 25 ⌉. Figure 3 presents the results in Enron network using a(n) = 1/⌈ n 20 ⌉, and V 0 with 1000 nodes randomly selected. Among the three techniques compared on these figures, RDS always works better. The MSE of RL-technique is comparable to RDS, and in some cases very close to it.

Stability of RL-technique
Now we concentrate on single sample path properties of the algorithms. Hence, the numerator of NRMSE becomes absolute error. Figure 5a shows the effect of increasing initial set V 0 size while fixing step size a(n) and Fig. 5b shows the effect of changing a(n) when V 0 is fixed. In both the cases, the green curve of RL-technique shows much stability compared to the other techniques.

Designing tips for the RL-technique
Some observations from the numerical experiments performed above are as follows: 1. With respect to the limiting variance, RDS always outperforms the other two methods tested. However, with a good choice of parameters the performance of RL is not far from that of RDS; 2. In the RL-technique, we find that the normalizing term 1/|V 0 | j V n (j) converges much faster than the other two options, V t (i 0 ) and min i V t (i); 3. When the size of the super-node decreases, the RL-technique requires smaller step size a(n). For instance in case of Les Misérables network, if the initial set V 0 size is less than 10, RL-technique does not converge with a(n) = 1/(⌈ n 50 ⌉ + 1) and requires a(n) = 1/(⌈ n 5 ⌉); 4. If step size a(n) decreases or the super-node size increases, RL fluctuates less but with slower convergence. In general, RL has less fluctuations than MH-MCMC or RDS.

NRMSE comparison of RDS and RT-estimator
Here, we compare RDS and RT estimators. The choice of RDS for comparison is motivated by the results shown in the previous section that it outperforms other sampling schemes considered in this paper so far, in terms of asymptotic variance and mean squared error. Moreover, RT-estimator can be regarded as a natural modification of RDS making use of the ideas of tours and super-node. Figure 6 shows the results in Friendster network. It can be readily noticed that RTestimator outperforms RDS. Figure 7 presents the results for Enron email network. In both the cases, RT-estimator performs the best and we see this as a consequence of the introduction of super-node to overcome slow mixing.

Conclusions
We addressed a critical issue in the RW based sampling methods on graphs: the burnin period. Our ideas are based on exploiting the tours (regenerations) and on the best use of the given seed nodes by making only short tours. These short tours or crawls, which start and return to the seed node set, are independent and can be implemented in a massively parallel way. The idea of regeneration allows us to construct estimators that are not marred by the burn-in requirement. We proposed two estimators based on this general idea. The first, the RL-estimator, uses reinforcement learning and stochastic approximation to build a stable estimator by observing random walks returning to the seed set. We then proposed the RT-estimator, which is a modification of the classical respondent-driven sampling, making use of the idea of short crawls and super-node. These two schemes have advantages of their own: the reinforcement learning scheme offers more control on the stability of the sample path with varying error performance, and the modified RDS scheme based on short crawls is simple and has superior performance compared to the classical RDS.
In the future, our aim is to study deeply the theoretical performance of our algorithms. We have also left open the selection process for the initial seed set or the super-node, and this also suggests an interesting research topic to explore in the future.