 Research
 Open Access
 Published:
Gumbelsoftmaxbased optimization: a simple general framework for optimization problems on graphs
Computational Social Networks volume 8, Article number: 5 (2021)
Abstract
In computer science, there exist a large number of optimization problems defined on graphs, that is to find a best node state configuration or a network structure, such that the designed objective function is optimized under some constraints. However, these problems are notorious for their hardness to solve, because most of them are NPhard or NPcomplete. Although traditional general methods such as simulated annealing (SA), genetic algorithms (GA), and so forth have been devised to these hard problems, their accuracy and time consumption are not satisfying in practice. In this work, we proposed a simple, fast, and general algorithm framework based on advanced automatic differentiation technique empowered by deep learning frameworks. By introducing Gumbelsoftmax technique, we can optimize the objective function directly by gradient descent algorithm regardless of the discrete nature of variables. We also introduce evolution strategy to parallel version of our algorithm. We test our algorithm on four representative optimization problems on graph including modularity optimization from network science, Sherrington–Kirkpatrick (SK) model from statistical physics, maximum independent set (MIS) and minimum vertex cover (MVC) problem from combinatorial optimization on graph, and Influence Maximization problem from computational social science. Highquality solutions can be obtained with much less timeconsuming compared to the traditional approaches.
Introduction
In computer science, there exist a large number of optimization problems defined on graphs, e.g., maximal independent set (MIS) and minimum vertex cover (MVC) problems [1]. In these problems, one is asked to give a largest (or smallest) subset of the graph under some constraints. In statistical physics, finding the ground state configuration of spin glasses model where the energy is minimized is another type of optimization problems on specific graphs [2]. Obviously, in the field of network science, there are a great number of optimization problems defined on graphs abstracted from realworld networks. For example, modularity maximization problem [3] asks to specify which community one node belongs to so that the modularity value is maximized. According to the definition given in [4], these optimization problems can be categorized as limited global optimization problem, since we want to find the global optimal point for our objective function. In general, the space of possible solutions of mentioned problems is typically very large and grows exponentially with system size, thus impossible to solve by exhaustion.
There are many algorithms for optimization problem. Coordinate descent algorithm (CD) which is based on line search is a classic algorithm and solves optimization problems by performing approximate minimization along coordinate directions or coordinate hyperplanes [5]. However, it does not take gradient information into optimizing process and can be unstable on unsmooth functions. Particle swarm optimization (PSO) is another biologically derived algorithm that can be effective for optimizing a wide range of functions [6]. It is highly dependent on stochastic processes, and it does not take advantage of gradient information either. Other widely used methods such as simulated annealing (SA) [7], genetic algorithm (GA) [8], and extremal optimization (EO) [9] are capable of solving various kinds of problems. However, when it comes to combinatorial optimization problems on graphs, these methods usually suffer from slow convergence and are limited to system size up to thousand. Although there exist many other heuristic solvers such as local search [10], they are usually domainspecific and require special domain knowledge.
Fortunately, there are other optimization methods based on gradient descent that are able to work without suffering from these drawbacks. However, these gradientbased methods require the gradient calculation which has to be designed manually throughout the optimization process for each specific problems; thereafter, they lack flexibility and generalizability.
Nowadays, with automatic differentiation technique [11] developed in deep learning area, gradient descentbased methods have been renewed. Based on computational graph and tensor operation, this technique automatically calculates the derivative, so that back propagation can work more easily. Once the forward computational process is well defined, the automatic differentiation framework can automatically compute the gradients of all variables with respect to the objective function.
Nevertheless, there exist combinatorial optimization problems on graphs whose objective functions are nondifferentiable; therefore, cannot be solved using automatic differentiation technique. Some other techniques developed in reinforcement learning area seek to solve the problems directly without training and testing stages. For example, REINFORCE algorithm [12] is a typical gradient estimator for discrete optimization. Recently, reparameterization trick, which is a competitive candidate of REINFORCE algorithm for estimating gradient, is developed in machine learning community. For example, Gumbelsoftmax [13, 14] provides another approach for differentiable sampling. It allows us to pass gradients through sampling process directly. It has been applied on various machine learning problems [13, 14].
With reparameterization trick such as Gumbelsoftmax, it is possible to treat many discrete optimization problems on graphs as continuous optimization problems [15] and apply a series of gradient descentbased algorithms [16]. Although these reinforcement learning and reparameterization tricks provide us a new way to solve discrete problems, when it comes to complicated combinatorial optimization problems on large graphs, the performances of these methods are not satisfying, because they often stuck with local optimum.
Nowadays, a great number of hybrid algorithms taking advantage of both gradient descent and evolution strategy have shown their effectiveness over optimization problems [17, 18] such as function optimization. Other populationbased algorithms [19] also show potential to work together with gradientbased methods to achieve better performance.
In this work, we present a novel general optimization framework based on automatic differentiation technique and Gumbelsoftmax, including Gumbelsoftmax optimization (GSO) [20] and Evolutionary Gumbelsoftmax optimization (EvoGSO). The original Gumbelsoftmax optimization algorithm applies Gumbelsoftmax reparameterization trick on combinatorial problems on graphs directly to convert the original discrete problem into a continuous optimization problem, such that the gradient decent method can be used. The batched version of GSO algorithm improves the results by searching the best solution in a group of optimization variables undergoing gradient decent optimization process in a parallel manner. The evolutionary Gumbelsoftmax optimization method builds a mixed algorithm that combines the batched version of GSO algorithm and evolutionary computation methods. The key idea is to treat the batched optimization variables—the parameters as a population, such that the evolutionary operators, e.g., substitution, mutation, and crossover, can be applied. The introduction of evolutionary operators can significantly accelerate the optimization process.
We first introduce our method proposed in [20] and then the improved algorithm: evolutionary Gumbelsoftmax (EvoGSO). Then, we give a brief description of four different optimization problems on graphs and specify our experiment configuration, followed by main results on these problems, compared with different benchmark algorithms. The results show that our framework can achieve competitive optimal solutions and also benefit from time consumption. Finally, we give some concluding remarks and prospect of future work.
The proposed algorithm
In [20], we proposed Gumbelsoftmax optimization (GSO), a novel general method for solving combinatorial optimization problems on graphs. Here, we briefly introduce the basic idea of GSO and then introduce our improvement: evolutionary Gumbelsoftmax optimization (EvoGSO).
Gumbelsoftmax optimization (GSO)
Considering an optimization problems on graph with N nodes, each node can take K different values, i.e., selected or nonselected for \(K=2\). Our goal is to find configuration \(\mathbf {s}=(s_1, s_2, \ldots , s_N)\) that minimizes the objective function. Suppose we can sample from all allowed solution space easily, we want those configurations with lower objective function to have higher probabilities \(p(\mathbf {s})\). Here, \(p(\mathbf {s})\) is the joint distribution of solutions, which is the key for the optimization.
There are a large number of methods to specify the joint distribution, among which the mean field factorization is the simplest one. That is, we factorize the joint distribution of solutions into the product of N independent categorical distributions [21], which is also called naive mean field in physics:
and the marginal probability \(p(s_i)\in [0,1]^K\) can be parameterized by a set of parameters \(\theta _i\) which is easily generated by Sigmoid or softmax function.
It is easy to sample a possible solution \(\mathbf {s}\) according to Eq. 1 and then evaluate the objective function \(E(\mathbf {s};{\varvec{\theta }})\). However, due to the nondifferentiable nature of sampling, we cannot estimate the gradients of \({{\varvec{\theta }}}\) unless we resort to Monte Carlo gradient estimation techniques such as REINFORCE [12]. Gumbelsoftmax [13], also known as concrete distribution [14], provides an alternative approach to tackle the difficulty of nondifferentiability. Consider a categorical variable \(s_i\) that can take discrete values \(s_i \in \{1,2,\ldots , K\}\). This variable \(s_i\) can be parameterized as a Kdimensional vector \((p_1, p_2, \ldots , p_K)\) where \(\theta _i\) is the probability that \(\theta _i=p(s_i=r), r=1, 2, \ldots , K\). Instead of sampling a hard onehot vector, Gumbelsoftmax technique gives a Kdimensional sampled vector where the ith entry is:
where \(g_i \sim \text {Gumbel}(0,1)\) is a random variable following standard Gumbel distribution and \(\tau\) is the temperature parameter. Notice that as \(\tau \rightarrow 0\), the softmax function will approximate \(\text {argmax}\) function and the sampled vector will approach a onehot vector. And the onehot vector can be regarded as a sampled solution according to the distribution \((p_1,p_2,\ldots ,p_K)\), because the unitary element will appear on the \(i{\text{th}}\) element in the onehot vector with probability \(p_i\); therefore, the computation of Gumbelsoftmax function can simulate the sampling process. Furthermore, this technique allows us to pass gradients directly through the “sampling” process, because all the operations in Eq. 2 are differentiable. In practice, it is common to adopt a annealing schedule from a high temperature \(\tau\) to a small temperature.
In a concise manner, we randomly initialize a series of learnable parameters \({{\varvec{\theta }}}\) which are the variables for optimization and the probabilities \({{\varvec{p}}}\) are generated by Sigmoid function over these parameters. Then, we sample from \({{\varvec{p}}}\) with Gumbelsoftmax technique to get solutions and calculate objective function. Finally, we run back propagation algorithm to update parameters \({{\varvec{\theta }}}\). The whole process is briefly demonstrated in Fig. 1.
Parallel version of GSO
We point out that our method can be implemented in parallel on GPU: \(N_{\text {bs}}\) different learnable parameters \(\varvec{\theta }\) can form a group which is called a batch. These parameters are initialized and optimized simultaneously. Therefore, we have \(N_{\text {bs}}\) candidate solutions in a batch instead of one. When the optimizing procedure is finished, we select the solution with the best performance from this batch. In such a way, we can take full advantage of GPU acceleration and obtain better results more likely.
The whole process of optimization solution is presented in Algorithm (1).
Evolutionary Gumbelsoftmax optimization (EvoGSO)
In parallelized GSO, simply selecting the result with the best performance from the batch cannot take fully advantage of other candidates. Therefore, we propose an improved version of algorithm called Evolutionary Gumbelsoftmax optimization (EvoGSO) by combining evolutionary operators and Gumbelsoftmax optimization method. The key idea is to treat a batch as a population, so that we can perform populationbased evolution strategies [19] to improve this algorithm.
Evolution strategy and evolution programming [22] have shown their capability of solving many optimization problems, and they bring diversity to the population and can potentially overcome the difficulty of local minima. In this work, we introduce two types of simple but effective operations to our original GSO algorithm: selective substitution inspired by swarm intelligence and evolutionary operators from genetic algorithm including selection, crossover, and mutation.
Selective substitution
During the process of gradient descent, we replace the parameters of worst 1/u individuals with a series of alternative parameters every \(T_1\) steps. Where, the ratio of substitution 1/u and the evolution cycle \(T_1\) are hyperparameters which are varying according to specific problems. The alternative parameters can be the parameters with the best performance in the population, or the best ones with stochastic disturbance, or the ones randomly reinitialized in the problem domain [22]. This operation is particularly effective on population with high deviation and problems with severe local minima.
Selection, crossover, and mutation
When GSO reaches convergence where further optimized solutions cannot be found, we introduce these operators from the classic genetic algorithm to the population for the purpose of diversity and preservation of excellent genes (certain bits or segments of parameters). Here, we adopt roulette wheel selection, singlepoint crossover and binary mutation, as well as elitist preservation strategy [8]. Since this operation significantly changes the structure of parameters which works against gradient descent, the good performance can be achieved if the evolution operators are implemented after each convergence and with cycle \(T_2\) long enough for the population to converge.
We present our proposed method in Algorithm (2).
In Table 1, we show a comparison between our proposed methods and some of the optimization algorithms mentioned in introduction section.
Experiments
A simple example
To show the importance and the efficiency of combining evolutionary operators and gradientbased optimization method, we use a functional optimization problem as an example at first. We test the hybrid algorithm of evolutionary method and gradientbased method on functional optimization problem for Griewank and Rastrigin functions (Fig. 2). These functions are classic test functions for optimization algorithms, since they contain lots of local minima, and the global minimum can be hard to find.
We run three different optimization algorithms on these functions: gradient descent (GD) with learning rate \(\eta\) = 0.01, GD with random initialization with cycle T = 1000 and hybrid algorithm of GD and evolution strategy with population size \(N_{\text {bs}}\) = 64, evolution cycle T = 1000, and the substitution ratio 1/u = 1/4 (see Fig. 3a). In gradient descent algorithm, candidates usually stuck in local minima after convergence (see Fig. 3b). After we add random initialization operation, candidates are able to jump out of these local minima and have more chance to find global minimum (see Fig. 3c, d). However, it is stochastic and candidates are unable to share information with each other. Finally, we test a hybrid algorithm of GD and evolution strategy. We adopt selective substitution operation inspired by swarm intelligence, in which candidates are able to communicate, so that the good results can be preserved and inherited (see Fig. 3e). Figure 3 illustrates five key frames on contour of Griewank function during the optimizing process of this hybrid algorithm and a comparison bar graph shows the number of global minimum found by different optimization algorithms in 100 instances. We can clearly see that the hybrid algorithm outperforms its two competitors and obtain global minimum more likely.
Combinatorial optimization problems on graphs
To further test the performance of our proposed algorithms, we conduct experiments on different optimization problems on graphs. We perform all experiments on a server with an Intel Xeon Gold 5218 CPU and NVIDIA GeForce RTX 2080Ti GPUs. For comparison, we mainly test the three general optimization algorithms: extremal optimization (EO) [9], simulated annealing (SA) [7], and genetic algorithm (GA).
Modularity maximization
Modularity is a graph clustering index for detecting community structure in complex networks [23]. Suppose a graph \(\mathcal {G(V,E)}\) is partitioned into K communities, the objective is to maximize the following modularity function, such that the best partition for nodes can be found:
where \(\mathcal {E}\) is the number of edges, \(k_i\) is the degree of node i, \(s_i\in \{0,1,\ldots ,K1\}\) is a label denoting which community of node i belongs to, \(\delta (s_i,s_j)=1\) if \(s_i=s_j\) and 0 otherwise. \(A_{ij}\) is the adjacent matrix of the graph. Maximizing modularity in general graphs is an NPhard problem [24].
We use the realworld datasets that have been well studied in [3, 25, 26]: Karate, Jazz, C. elegans, and Email to test the algorithms. We run experiments on each dataset with the number of communities Ncoms ranging from 2 to 20. We run 10 instances for each fixed Ncoms. After the optimization process for the modularity in all Ncoms values, we report the maximum modularity value Q and the corresponding Ncoms in Table 2. Our proposed methods have achieved competitive modularity values on all datasets compared to hierarchical agglomeration [25] and EO [26].
Figure 4 further shows the modularity value with different number of communities on C.elegans and Email. Comparing to GA and SA, our proposed methods have achieved much higher modularity for different number of communities.
Sherrington–Kirkpatrick (SK) model
SK model is a celebrated spin glasses model defined on a complete graph [27]. Each node represents an Ising spin \(\sigma _i \in \{1, +1\}\), and the interaction between spins \(\sigma _i\) and \(\sigma _j\) is \(J_{ij}\) sampled from a Gaussian distribution \(\mathcal {N}(0, 1/N)\), where N is the number of spins. We are asked to give an assignment of each spin, so that the objective function, or the ground state energy:
is minimized. It is also an NPhard problem [2].
We test our algorithms on SK model with various sizes ranging from 256 to 8192. The stateoftheart results are obtained by EO [9]. The results are shown in Tables 3 and 4. From Table 3, we see that although EO has obtained lower ground state energy, it only reported results of system size up to \(N=1024\), because it is extremely timeconsuming. In fact, the algorithmic cost of EO is \(\mathcal {O}(N^4)\). In the implementation of SA and GA, we set the time limit to be 96 h and the program failed to finish for some N in both algorithms. Although the results of SA are much better than GA, they are still not satisfying for larger N. For SK model, we adopt only selective substitution in EvoGSO.
We also compare Gumbelsoftmax based algorithms with different batch sizes and the EvoGSO. From Table 4, we see that with the implementation of the parallel version, the results can be improved greatly. Besides, the EvoGSO outperforms GSO for larger N.
Maximal independent set (MIS) and minimum vertex cover (MVC) problems
MIS and MVC problems are canonical NPhard combinatorial optimization problems on graphs [1]. Given an undirected graph \(\mathcal {G(V,E)}\), the MIS problem asks to find the largest subset \(\mathcal V^{\prime } \subseteq \mathcal V\), such that no two nodes in \(\mathcal V^{\prime }\) are connected by an edge in \(\mathcal E\). Similarly, the MVC problem asks to find the smallest subset \(\mathcal V^{\prime } \subseteq \mathcal V\), such that every edge in \(\mathcal {E}\) is incident to a node in \(\mathcal {V^{\prime }}\). MIS and MVC are constrained optimization problems and cannot be optimized directly by our framework. Here, we adopt penalty method and Ising formulation to transform them into unconstrained problems.
We can place an Ising spin \(\sigma _i\) on each node and then define the binary bit variable \(x_i = (\sigma _i + 1)/2\). Here, \(x_i = 1\) means that node i belongs to the subset \(\mathcal {V^{\prime }}\) and \(x_i=0\) otherwise. Thus, the Ising Hamiltonians for MIS problem is:
Similarly, the Ising Hamiltonians for MVC becomes:
where \(\alpha > 0\). The first term on righthand side is the number of selected nodes and the second term provides a penalty if selected nodes violate constraint. \(\alpha\) is a penalty parameter and its value is crucial to the performance of our framework. If \(\alpha\) is set too small, we may not find any feasible solutions. Conversely, if it is set too big, we may find lots of feasible solutions whose qualities are not satisfying. In this work, we set \(\alpha\) to 3, which assures both quality and amount of feasible solutions.
We test our algorithms on three citation graphs: Cora, Citeseer and PubMed. Beyond the standard general algorithms like Genetic Algorithm and Simulating Annealing, we also compare with other deep learningbased algorithms including (1) Structure2Vec Deep Qlearning (S2VDQN) [29]: a reinforcement learning method to address optimization problems over graphs, and (2) Graph Convolutional Networks with Guided Tree Search (GCNGTS) [30]: a supervised learning method based on graph convolutional networks (GCN) [31], as well as the wellknown greedy algorithms on MIS and MVC problems like (3) greedy algorithm (Greedy) and Minimumdegree greedy algorithm (MDGreedy) [32]: a simple and wellstudied method for finding independent sets in graphs.
We run 20 instances and report results with best performance. The results of MIS and MVC problems are shown in Table 5. Our proposed algorithms have obtained much better results compared to the classical general optimization methods including greedy and SA on all three datasets. Although our methods cannot beat MDGreedy algorithm, they do not use any prior information about the graph. However, MDGreedy requires to compute degrees of all nodes on the graph. Furthermore, we do not report the results of GA algorithm, because without heuristic and specific design, the general GA failed to find any feasible solution, since MIS and MVC are constrained optimization problems.
It is necessary to emphasize the differences between our framework and other deep learningbased algorithms such as S2VDQN and GCNGTS. These algorithms belong to supervised learning, which thus contain two stages of problem solving: training the solver at first, and then testing. Although relatively good solutions can be obtained efficiently, they must consume a great deal of time for training the solver and the qualities of solutions depend heavily on the quality and the amount of the data for training. These features can hardly extend for large graphs. Comparatively, our proposed framework is more direct and light weight; it contains only optimization stage. It requires no training part and has no dependence on data or specific domain knowledge at all; therefore, it can easily be generalized and modified for different optimization problems.
Influence maximization problem
Influence maximization is one of the most representative and attractive problems in computational social science. There are some classical models such as Independent Cascade(IC) and Linear Threshold(LT) as well as some innovative models such as the biobjective optimization model in [33]. However, these models often contain many indifferentiable operations during the propagation process which can be very tricky for our proposed method to perform effectively. Therefore, we bring up a simple model to simulate the influence propagation, and the whole process is differentiable and Markovian, which is able to clearly demonstrate the performance of our method.
In this model, node’s value can be interpreted as how much it is influenced or the probability that it is activated in IC or LT model. The range is limited between 0 and 1. Message passing which occurs along existing social networks is continuously. That is, every node may forward messages to its neighbor on each time step. Each node receives and sends message to its neighbor at the same time. The amount of message that one node sends equal to its current value, and they are equally distributed to its neighbors.
With these assumptions, we can easily analog the propagation process by matrix multiplication of states’ vector X and a transfer matrix T. Obviously, such computation is differentiable. Therefore, we have:
However, we still need a penalty function to restrict the number of initial nodes. Here, we simply use a quadratic function with its minimum point num being the number of initial nodes we want and the coefficient \(\alpha\) being a hyperparameter that can be adjusted. The objective function is:
We test our algorithms on four networks compared to SA and Greedy algorithms. The results are shown in Tables 6 and 7. Our method performs similarly as SA and Greedy methods on small graphs. On Karate network, our method obtains the global maximum, while Greedy failed. Although, on large graphs, our method usually performs not as well as Greedy, ours is much faster, because it does not go through the whole propagation process on each attempt like Greedy. These experiments on influence maximization problems aim not to defeat other algorithms, but to show the great potential on solving various social computational problems, for considerably less time consumption and relatively satisfying results.
Sensitivity analysis on hyperparameters
We also perform experiments to test how hyperparameters in evolution operation affects the performance of our algorithms. We have tried different population size \(N_{{\text{bs}}}\), evolution cycle \(T_1\), and substitution ratio 1/u on SK model with 1024 and 8192 nodes. The default configurations are: initial \(\tau = 20\), final \(\tau = 1\), learning rate \(\eta\) = 1, \(N_{{\text{bs}}} = 128\), \(T_1 = 100\), and \(1/u = 1/8\), and then, we change one hyperparameter every time for test. The results are shown in Fig. 5 . We can see that our framework shows different sensitivity to these hyperparameters as they changes, and a relatively satisfying combination of hyperparameters can be given from this research.
Conclusion
In this work, we present a simple general framework for solving optimization problems on graphs. Our method is based on advanced automatic differentiation techniques and Gumbelsoftmax technique which allows the gradients passing through sampling processes directly. We assume that all nodes in the network are independent, and thus, the joint distribution is factorized as a product distributions of each node. This enables Gumbelsoftmax sampling process efficiently. Furthermore, we introduce evolution strategy into our framework, which brings diversity and improves the performance of our algorithm. Our experiment results show that our method has good performance on all four tasks and also take advantages in time complexity. Comparing to the traditional general optimization methods such as GA and SA, our framework can tackle large graphs easily and efficiently. Though not competitive to stateoftheart deep learningbased method, our framework has the advantage of requiring neither training the solver nor specific domain knowledge. In general, it is an efficient, general, and lightweight optimization framework for solving optimization problems on graphs.
However, there is much space to improve our algorithm on accuracy. In this paper, we take the mean field approximation as our basic assumption; however, the variables are not independent on most problems. Therefore, much more sophisticated variational distributions can be considered in the future. Another way to improve accuracy is to combine other skills such as local search in our framework. Since our framework is general and requires no specific domain knowledge, it shall be tested for solving other complex optimization problems in the future.
Availability of data and materials
The dataset analyzed in this study is publicly available online at http://networkrepository.com/.
Abbreviations
 GSO:

Gumbelsoftmax optimization
 EvoGSO:

Evolutionary Gumbelsoftmax optimization
References
 1.
Karp RM. Reducibility among combinatorial problems. Complexity of computer computations. Berlin: Springer; 1972. p. 85–103.
 2.
Mézard M, Parisi G, Virasoro M. Spin glass theory and beyond: an introduction to the replica method and its applications, vol. 9. Singapore: World Scientific Publishing Company; 1987.
 3.
Newman ME. Modularity and community structure in networks. Proc Natl Acad Sci. 2006;103(23):8577–82.
 4.
Galperin EA. Problemmethod classification in optimization and control. Comput Math Appl. 1991;21(6–7):1–6.
 5.
Wright SJ. Coordinate descent algorithms. Math Prog. 2015;151(1):3–34.
 6.
Kennedy J, Eberhart RC. Particle swarm optimization. In: Proceedings of the IEEE international conference on neural networks; 1995. p. 1942–8.
 7.
Kirkpatrick S, Gelatt CD, Vecchi MP. Optimization by simulated annealing. Science. 1983;220(4598):671–80.
 8.
Davis L. Handbook of genetic algorithms; 1991.
 9.
Boettcher S, Percus A. Nature’s way of optimizing. Artif Intell. 2000;119(1–2):275–86.
 10.
Andrade DV, Resende MG, Werneck RF. Fast local search for the maximum independent set problem. J Heurist. 2012;18(4):525–47.
 11.
Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A. Automatic differentiation in pytorch. In: NIPSW; 2017.
 12.
Williams RJ. Simple statistical gradientfollowing algorithms for connectionist reinforcement learning. Mach Learn. 1992;8(3–4):229–56.
 13.
Jang E, Gu S, Poole B. Categorical reparameterization with gumbelsoftmax. In: 5th international conference on learning representations, ICLR 2017, Toulon, France, April 24–26, 2017, conference track proceedings. OpenReview.net; 2017. https://openreview.net/forum?id=rkE3y85ee.
 14.
Maddison CJ, Mnih A, Teh YW. The concrete distribution: A continuous relaxation of discrete random variables. In: 5th International conference on learning representations, ICLR 2017, Toulon, France, April 24–26, 2017, conference track proceedings; 2017. https://openreview.net/forum?id=S1jE5L5gl.
 15.
Andreasson N, Evgrafov A, Patriksson M. An introduction to continuous optimization: foundations and fundamental algorithms; 2007. p. 400.
 16.
Avraamidou S, Pistikopoulos EN. Optimization of complex systems: theory, models, algorithms and applications, vol. 991. Berlin: Springer; 2020. p. 579–588. https://doi.org/10.1007/9783030218034.
 17.
Zidani H, Ellaia R, de Cursi ES. A hybrid simplex search for global optimization with representation formula and genetic algorithm. Advances in intelligent systems and computing, vol. 991. Berlin: Springer; 2020. p. 3–15.
 18.
Rocha AMA, Costa MFP, Fernandes EM. A populationbased stochastic coordinate descent method. In: World congress on global optimization. Berlin: Springer; 2019. pp. 16–25.
 19.
Yildiz AR. A comparative study of populationbased optimization algorithms for turning operations. Inf Sci. 2012;210:81–8. https://doi.org/10.1016/j.ins.2012.03.005.
 20.
Liu J, Gao F, Zhang J. Gumbelsoftmax optimization: A simple general framework for combinatorial optimization problems on graphs. In: International conference on complex networks and their applications. Berlin: Springer; 2019. p. 879–90.
 21.
Wainwright MJ, Jordan MI, et al. Graphical models, exponential families, and variational inference. Found Trends® Mach Learn. 2008;1(1–2):1–305.
 22.
Bäck T, Bäck T, Rudolph G, Schwefel H.P. Evolutionary programming and evolution strategies: similarities and differences. In: Proceedings of the second annual conference on evolutionary programming. p. 11–22.
 23.
Fortunato S. Community detection in graphs. Phys Rep. 2010;486(3–5):75–174.
 24.
Brandes U, Delling D, Gaertler M, Görke R, Hoefer M, Nikoloski Z, Wagner D. On finding graph clusterings with maximum modularity. In: International Workshop on GraphTheoretic Concepts in Computer Science. Berlin: Springer; 2007. p. 121–32.
 25.
Newman ME. Fast algorithm for detecting community structure in networks. Phys Rev E. 2004;69(6):066133.
 26.
Duch J, Arenas A. Community detection in complex networks using extremal optimization. Phys Rev E. 2005;72(2):027104.
 27.
Sherrington D, Kirkpatrick S. Solvable model of a spinglass. Phys Rev Lett. 1975;35(26):1792.
 28.
Boettcher S. Extremal optimization for sherringtonkirkpatrick spin glasses. Eur Phys J B Condens Matter Complex Syst. 2005;46(4):501–5.
 29.
Khalil E, Dai H, Zhang Y, Dilkina B, Song L. Learning combinatorial optimization algorithms over graphs. In: Advances in neural information processing systems; 2017. p. 6348–58.
 30.
Li Z, Chen Q, Koltun V. Combinatorial optimization with graph convolutional networks and guided tree search. In: Advances in neural information processing systems; 2018. p. 539–48.
 31.
Kipf TN, Welling M. Semisupervised classification with graph convolutional networks; 2016. arXiv preprint arXiv:1609.02907.
 32.
Halldórsson MM, Radhakrishnan J. Greed is good: approximating independent sets in sparse and boundeddegree graphs. Algorithmica. 1997;18(1):145–63.
 33.
Agha Mohammad Ali Kermani M, Aliahmadi A, Hanneman R. Optimizing the choice of influential nodes for diffusion on a social network. Int J Commun Syst. 2016;29(7):1235–50.
Acknowledgements
This research is supported by the National Natural Science Foundation of China (NSFC) (no. 61673070) and the Fundamental Research Funds for the Central Universities (no. 2020KJZX004).
Funding
Not applicable.
Author information
Affiliations
Contributions
JZ, YL, and JL conceived and designed the research. YL, JL, and JZ designed the model structure. YL and JL developed the model. YL, JL, GL, YH, and MM performed the experiments. JZ, YL, and JL wrote the manuscript. JZ reviewed and revised the manuscript. JZ supervised the research. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Li, Y., Liu, J., Lin, G. et al. Gumbelsoftmaxbased optimization: a simple general framework for optimization problems on graphs. Comput Soc Netw 8, 5 (2021). https://doi.org/10.1186/s4064902100086z
Received:
Accepted:
Published:
Keywords
 Optimization problems on graphs
 Gumbelsoftmax
 Evolution strategy