CLC number: TP14
On-line Access: 2021-11-15
Received: 2020-11-08
Revision Accepted: 2021-02-15
Crosschecked: 2021-04-01
Cited: 0
Clicked: 5824
Citations: Bibtex RefMan EndNote GB/T7714
Bihao Sun, Jinhui Hu, Dawen Xia, Huaqing Li. A distributed stochastic optimization algorithm with gradient-tracking and distributed heavy-ball acceleration[J]. Frontiers of Information Technology & Electronic Engineering, 2021, 22(11): 1463-1476.
@article{title="A distributed stochastic optimization algorithm with gradient-tracking and distributed heavy-ball acceleration",
author="Bihao Sun, Jinhui Hu, Dawen Xia, Huaqing Li",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="22",
number="11",
pages="1463-1476",
year="2021",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2000615"
}
%0 Journal Article
%T A distributed stochastic optimization algorithm with gradient-tracking and distributed heavy-ball acceleration
%A Bihao Sun
%A Jinhui Hu
%A Dawen Xia
%A Huaqing Li
%J Frontiers of Information Technology & Electronic Engineering
%V 22
%N 11
%P 1463-1476
%@ 2095-9184
%D 2021
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2000615
TY - JOUR
T1 - A distributed stochastic optimization algorithm with gradient-tracking and distributed heavy-ball acceleration
A1 - Bihao Sun
A1 - Jinhui Hu
A1 - Dawen Xia
A1 - Huaqing Li
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 22
IS - 11
SP - 1463
EP - 1476
%@ 2095-9184
Y1 - 2021
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2000615
Abstract: distributed optimization has been well developed in recent years due to its wide applications in machine learning and signal processing. In this paper, we focus on investigating distributed optimization to minimize a global objective. The objective is a sum of smooth and strongly convex local cost functions which are distributed over an undirected network of n nodes. In contrast to existing works, we apply a distributed heavy-ball term to improve the convergence performance of the proposed algorithm. To accelerate the convergence of existing distributed stochastic first-order gradient methods, a momentum term is combined with a gradient-tracking technique. It is shown that the proposed algorithm has better acceleration ability than GT-SAGA without increasing the complexity. Extensive experiments on real-world datasets verify the effectiveness and correctness of the proposed algorithm.
[1]Bertsekas D, Gafni E, 1983. Projected Newton methods and optimization of multicommodity flows. IEEE Trans Autom Contr, 28(12):1090-1096.
[2]Boyd S, Parikh N, Chu E, et al., 2011. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends® Mach Learn, 3(1):1-122.
[3]Cheng B, Li ZK, 2019. Coordinated tracking control with asynchronous edge-based event-triggered communications. IEEE Trans Autom Contr, 64(10):4321-4328.
[4]Cheng S, Chen MY, Wai RJ, et al., 2014. Optimal placement of distributed generation units in distribution systems via an enhanced multi-objective particle swarm optimization algorithm. J Zhejiang Univ-Sci C (Comput & Electron), 15(4):300-311.
[5]Cohen K, Nedić A, Srikant R, 2017. Distributed learning algorithms for spectrum sharing in spatial random access wireless networks. IEEE Trans Autom Contr, 62(6):2854-2869.
[6]Defazio A, Bach F, Lacoste-Julien S, 2014. SAGA: a fast incremental gradient method with support for non-strongly convex composite objectives. Proc 27th Int Conf on Neural Information Processing Systems, p.1646-1654.
[7]Dua D, Graff C, 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/ml
[8]Duchi JC, Agarwal A, Wainwright MJ, 2012. Dual averaging for distributed optimization: convergence analysis and network scaling. IEEE Trans Autom Contr, 57(3):592-606.
[9]Eisen M, Mokhtari A, Ribeiro A, 2017. Decentralized quasi-Newton methods. IEEE Trans Signal Process, 65(10):2613-2628.
[10]Erseghe T, Zennaro D, Dall’Anese E, et al., 2011. Fast consensus by the alternating direction multipliers method. IEEE Trans Signal Process, 59(11):5523-5537.
[11]Guan L, Sun T, Qiao LB, et al., 2020. An efficient parallel and distributed solution to nonconvex penalized linear SVMs. Front Inform Technol Electron Eng, 21(4):587-603.
[12]Han ZM, Lin ZY, Fu MY, et al., 2015. Distributed coordination in multi-agent systems: a graph Laplacian perspective. Front Inform Technol Electron Eng, 16(6):429-448.
[13]Hu JH, Yan Y, Li HQ, et al., 2021. Convergence of an accelerated distributed optimisation algorithm over time-varying directed networks. IET Contr Theory Appl, 15(1):24-39.
[14]Johnson R, Zhang T, 2013. Accelerating stochastic gradient descent using predictive variance reduction. Proc 26th Int Conf on Neural Information Processing Systems, p.315-323.
[15]Lan Q, Qiao LB, Wang YJ, 2018. Stochastic extra-gradient based alternating direction methods for graph-guided regularized minimization. Front Inform Technol Electron Eng, 19(6):755-762.
[16]Li HQ, Cheng HQ, Wang Z, et al., 2020. Distributed Nesterov gradient and heavy-ball double accelerated asynchronous optimization. IEEE Trans Neur Netw Learn Syst, in press.
[17]Li Z, Shi W, Yan M, 2019. A decentralized proximal-gradient method with network independent step-sizes and separated convergence rates. IEEE Trans Signal Process, 67(17):4494-4506.
[18]Ling Q, Ribeiro A, 2014. Decentralized dynamic optimization through the alternating direction method of multipliers. IEEE Trans Signal Process, 62(5):1185-1197.
[19]Ling Q, Tian Z, 2010. Decentralized sparse signal recovery for compressive sleeping wireless sensor networks. IEEE Trans Signal Process, 58(7):3816-3827.
[20]Liu R, Sun WC, Hou T, et al., 2019. Block coordinate descentwith time perturbation for nonconvex nonsmooth problems in real-world studies. Front Inform Technol Electron Eng, 20(10):1390-1403.
[21]Lü QG, Liao XF, Li HQ, et al., 2020. A Nesterov-like gradient tracking algorithm for distributed optimization over directed networks. IEEE Trans Syst Man Cybern, in press.
[22]Mateos G, Bazerque JA, Giannakis GB, 2010. Distributed sparse linear regression. IEEE Trans Signal Process, 58(10):5262-5276.
[23]Matthews TP, Wang K, Li CP, et al., 2016. Nonlinear waveform inversion by use of the regularized dual averaging method for ultrasound computed tomography. Progress in Electromagnetic Research Symp, p.3948.
[24]McMahan B, Moore E, Ramage D, et al., 2017. Communication-efficient learning of deep networks from decentralized data. Proc 20th Int Conf on Artificial Intelligence and Statistics, p.1273-1282.
[25]Nedić A, Ozdaglar A, 2009. Distributed subgradient methods for multi-agent optimization. IEEE Trans Autom Contr, 54(1):48-61.
[26]Nedić A, Olshevsky A, Shi W, 2017a. Achieving geometric convergence for distributed optimization over time-varying graphs. SIAM J Optim, 27(4):2597-2633.
[27]Nedić A, Olshevsky A, Shi W, et al., 2017b. Geometrically convergent distributed optimization with uncoordinated step-sizes. American Control Conf, p.3950-3955.
[28]Schmidt M, Le Roux N, Bach F, 2017. Minimizing finite sums with the stochastic average gradient. Math Program, 162(1-2):83-112.
[29]Tan CH, Ma SQ, Dai YH, et al., 2016. Barzilai-Borwein step size for stochastic gradient descent. Proc 30th Int Conf on Neural Information Processing Systems, p.685-693.
[30]Tsitsiklis J, Bertsekas D, Athans M, 1986. Distributed asynchronous deterministic and stochastic gradient optimization algorithms. IEEE Trans Autom Contr, 31(9):803-812.
[31]Wang B, Jiang HY, Fang J, et al., 2018. A proximal ADMM for decentralized composite optimization. IEEE Signal Process Lett, 25(8):1121-1125.
[32]Wang Z, Li HQ, 2020. Edge-based stochastic gradient algorithm for distributed optimization. IEEE Trans Netw Sci Eng, 7(3):1421-1430.
[33]Wei EM, Ozdaglar A, Jadbabaie A, 2013. A distributed Newton method for network utility maximization—I: algorithm. IEEE Trans Autom Contr, 58(9):2162-2175.
[34]Xi CG, Khan UA, 2017. DEXTRA: a fast algorithm for optimization over directed graphs. IEEE Trans Autom Contr, 62(10):4980-4993.
[35]Xia YS, Wang J, 2004. A one-layer recurrent neural network for support vector machine learning. IEEE Trans Syst Man Cybern B, 34(2):1261-1269.
[36]Xin R, Khan UA, 2018. A linear algorithm for optimization over directed graphs with geometric convergence. IEEE Contr Syst Lett, 2(3):315-320.
[37]Xin R, Khan UA, 2020. Distributed heavy-ball: a generalization and acceleration of first-order methods with gradient tracking. IEEE Trans Autom Contr, 65(6):2627-2633.
[38]Xin R, Jakovetić D, Khan UA, 2019a. Distributed Nesterov gradient methods over arbitrary graphs. IEEE Signal Process Lett, 26(8):1247-1251.
[39]Xin R, Sahu AK, Khan UA, et al., 2019b. Distributed stochastic optimization with gradient tracking over strongly-connected networks. Proc IEEE 58th Conf on Decision and Control, p.8353-8358.
[40]Xin R, Xi CG, Khan UA, 2019c. FROST—fast row-stochastic optimization with uncoordinated step-sizes. EURASIP J Adv Signal Process, 2019(1):1.
[41]Xin R, Khan UA, Kar S, 2020. Variance-reduced decentralized stochastic optimization with accelerated convergence. IEEE Trans Signal Process, 68:6255-6271.
[42]Xu JM, Zhu SY, Soh YC, et al., 2015. Augmented distributed gradient methods for multi-agent optimization under uncoordinated constant stepsizes. Proc 54th IEEE Conf on Decision and Control, p.2055-2060.
[43]Yin R, Zhang Y, Yu GD, et al., 2010. Centralized and distributed resource allocation in OFDM based multi-relay system. J Zhejiang Univ-Sci C (Comput & Electron), 11(6):450-464.
[44]Yuan DM, Ma Q, Wang Z, 2013. Distributed dual averaging method for solving saddle-point problems over multi-agent networks. Proc 32nd Chinese Control Conf, p.6868-6872.
[45]Zhang CL, Ahmad M, Wang YQ, 2019. ADMM based privacy-preserving decentralized optimization. IEEE Trans Inform Forens Secur, 14(3):565-580.
[46]Zinkevich MA, Weimer M, Smola A, et al., 2010. Parallelized stochastic gradient descent. Proc 23th Int Conf on Neural Information Processing Systems, p.2595-2603.
Open peer comments: Debate/Discuss/Question/Opinion
<1>