JZUS - Journal of Zhejiang University SCIENCE

Frontiers of Information Technology & Electronic Engineering

Accepted manuscript available online (unedited version)

Training time minimization for federated edge learning with optimized gradient quantization and bandwidth allocation

Author(s): Peixi LIU, Jiamo JIANG, Guangxu ZHU, Lei CHENG, Wei JIANG, Wu LUO, Ying DU, Zhiqin WANG
Affiliation(s): State Key Laboratory of Advanced Optical Communication Systems and Networks, Department of Electronics, Peking University, Beijing 100871, China; more
Corresponding email(s): jiangjiamo@caict.ac.cn, gxzhu@sribd.cn
Key Words: Federated edge learning; Quantization optimization; Bandwith allocation; Training time minimization

Share this article to： More <<< Previous Paper \|Next Paper >>>

Peixi LIU, Jiamo JIANG, Guangxu ZHU, Lei CHENG, Wei JIANG, Wu LUO, Ying DU, Zhiqin WANG. Training time minimization for federated edge learning with optimized gradient quantization and bandwidth allocation[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.2100538

@article{title="Training time minimization for federated edge learning with optimized gradient quantization and bandwidth allocation",
author="Peixi LIU, Jiamo JIANG, Guangxu ZHU, Lei CHENG, Wei JIANG, Wu LUO, Ying DU, Zhiqin WANG",
journal="Frontiers of Information Technology & Electronic Engineering",
year="in press",
publisher="Zhejiang University Press & Springer",
doi="https://doi.org/10.1631/FITEE.2100538"
}

%0 Journal Article
%T Training time minimization for federated edge learning with optimized gradient quantization and bandwidth allocation
%A Peixi LIU
%A Jiamo JIANG
%A Guangxu ZHU
%A Lei CHENG
%A Wei JIANG
%A Wu LUO
%A Ying DU
%A Zhiqin WANG
%J Frontiers of Information Technology & Electronic Engineering
%P 1247-1263
%@ 2095-9184
%D in press
%I Zhejiang University Press & Springer
doi="https://doi.org/10.1631/FITEE.2100538"

TY - JOUR
T1 - Training time minimization for federated edge learning with optimized gradient quantization and bandwidth allocation
A1 - Peixi LIU
A1 - Jiamo JIANG
A1 - Guangxu ZHU
A1 - Lei CHENG
A1 - Wei JIANG
A1 - Wu LUO
A1 - Ying DU
A1 - Zhiqin WANG
J0 - Frontiers of Information Technology & Electronic Engineering
SP - 1247
EP - 1263
%@ 2095-9184
Y1 - in press
PB - Zhejiang University Press & Springer
ER -
doi="https://doi.org/10.1631/FITEE.2100538"

Abstract
Chinese Summary
Academic Network
Reviewer Comment

Abstract: Training a machine learning model with federated edge learning (FEEL) is typically time consuming due to the constrained computation power of edge devices and the limited wireless resources in edge networks. In this study, the training time minimization problem is investigated in a quantized FEEL system, where heterogeneous edge devices send quantized gradients to the edge server via orthogonal channels. In particular, a stochastic quantization scheme is adopted for compression of uploaded gradients, which can reduce the burden of per-round communication but may come at the cost of increasing the number of communication rounds. The training time is modeled by taking into account the communication time, computation time, and the number of communication rounds. Based on the proposed training time model, the intrinsic trade-off between the number of communication rounds and per-round latency is characterized. Specifically, we analyze the convergence behavior of the quantized FEEL in terms of the optimality gap. Furthermore, a joint data-and-model-driven fitting method is proposed to obtain the exact optimality gap, based on which the closed-form expressions for the number of communication rounds and the total training time are obtained. Constrained by the total bandwidth, the training time minimization problem is formulated as a joint quantization level and bandwidth allocation optimization problem. To this end, an algorithm based on alternating optimization is proposed, which alternatively solves the subproblem of quantization optimization through successive convex approximation and the subproblem of bandwidth allocation by bisection search. With different learning tasks and models, the validation of our analysis and the near-optimal performance of the proposed optimization algorithm are demonstrated by the simulation results.

基于联邦边缘学习的梯度量化和带宽分配优化策略

刘沛西^1,3，江甲沫²，朱光旭³，程磊^4,5，蒋伟¹，罗武¹，杜滢²，王志勤²
¹北京大学电子学院区域光纤通信网与新型光通信系统国家重点实验室，中国北京市，100871
²中国信息通信研究院，中国北京市，100191
³深圳市大数据研究院，中国深圳市，518172
⁴浙江大学信息与电子工程学院，中国杭州市，310027
⁵浙江省信息处理与通信网络重点实验室，中国杭州市，310027
摘要：由于边缘设备有限算力和边缘网络有限的无线资源，利用联邦边缘学习（federated edge learning, FEEL）训练机器学习模型通常非常耗时。本文研究了量化FEEL系统中训练时间最小化问题，其中异构边缘设备通过正交信道向边缘服务器发送量化后的梯度。采用随机量化对上传的梯度进行压缩，可减少每轮通信的开销，但可能会增加通信轮数。综合考虑通信时间、计算时间和通信轮数对训练时间进行建模。基于所提出的训练时间模型，描述了通信轮数和每轮延迟之间的内在权衡。具体地，分析了量化FEEL的收敛性。提出一种基于数据模型双驱动的拟合方法以得到精确的最优间隔，并在此基础上得到通信轮数和总训练时间的闭式表达式。在总带宽限制下，将训练时间最小化问题建模为量化级数和带宽分配的优化问题。本文通过交替求解量化优化子问题（通过连续凸近似方法求解）和带宽分配子问题（通过二分查找方法求解）解决这个问题。在不同学习任务和模型下，仿真结果证明了本文分析的有效性和所提优化算法性能接近最优。

关键词组：联邦边缘学习；量化优化；带宽分配；训练时间最小化

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Alistarh D, Grubic D, Li JZ, et al., 2017. QSGD: communication-efficient SGD via gradient quantization and encoding. Proc 31^st Int Conf on Neural Information Processing Systems, p.1707-1718.

[2]Amiri MM, Gündüz D, 2020a. Federated learning over wireless fading channels. IEEE Trans Wirel Commun, 19(5):3546-3557.

[3]Amiri MM, Gündüz D, 2020b. Machine learning at the wireless edge: distributed stochastic gradient descent over-the-air. IEEE Trans Signal Process, 68:2155-2169.

[4]Basu D, Data D, Karakus C, et al., 2020. Qsparse-local-SGD: distributed SGD with quantization, sparsification, and local computations. IEEE J Sel Areas Inform Theory, 1(1):217-226.

[5]Bernstein J, Wang YX, Azizzadenesheli K, et al., 2018. signSGD: compressed optimisation for non-convex problems. Proc 35^th Int Conf on Machine Learning, p.560-569.

[6]Chang WT, Tandon R, 2020. Communication efficient federated learning over multiple access channels. https://arxiv.org/abs/2001.08737

[7]Chen MZ, Poor HV, Saad W, et al., 2021a. Convergence time optimization for federated learning over wireless networks. IEEE Trans Wirel Commun, 20(4):2457-2471.

[8]Chen MZ, Yang ZH, Saad W, et al., 2021b. A joint learning and communications framework for federated learning over wireless networks. IEEE Trans Wirel Commun, 20(1):269-283.

[9]Cover TM, Thomas JA, 2006. Elements of Information Theory (2^nd Ed.). John Wiley & Sons, Hoboken, USA.

[10]Dhillon HS, Huang H, Viswanathan H, 2017. Wide-area wireless communication challenges for the Internet of Things. IEEE Commun Mag, 55(2):168-174.

[11]Diamond S, Boyd S, 2016. CVXPY: a python-embedded modeling language for convex optimization. J Mach Learn Res, 17(1):2909-2913.

[12]Dinh CT, Tran NH, Nguyen MNH, et al., 2021. Federated learning over wireless networks: convergence analysis and resource allocation. IEEE/ACM Trans Netw, 29(1):398-409.

[13]Gong XW, Vorobyov SA, Tellambura C, 2011. Optimal bandwidth and power allocation for sum ergodic capacity under fading channels in cognitive radio networks. IEEE Trans Signal Process, 59(4):1814-1826.

[14]Gradshteyn IS, Ryzhik IM, 2014. Table of Integrals, Series, and Products. Academic Press, Cambridge, USA.

[15]He KM, Zhang XY, Ren SQ, et al., 2016. Deep residual learning for image recognition. IEEE Conf on Computer Vision and Pattern Recognition, p.770-778.

[16]Jin R, He X, Dai H, 2020. On the design of communication efficient federated learning over wireless networks. https://arxiv.org/abs/2004.07351v1

[17]Kairouz P, McMahan HB, Avent B, et al., 2019. Advances and open problems in federated learning. Found Trends® Mach Learn, 14(1-2):1-210.

[18]Letaief KB, Chen W, Shi YM, et al., 2019. The roadmap to 6G: AI empowered wireless networks. IEEE Commun Mag, 57(8):84-90.

[19]Li X, Huang KX, Yang WH, et al., 2020. On the convergence of FedAvg on non-IID data. Proc 8^th Int Conf on Learning Representations, p.1-26.

[20]Liu DZ, Simeone O, 2021. Privacy for free: wireless federated learning via uncoded transmission with adaptive power control. IEEE J Sel Areas Commun, 39(1):170-185.

[21]Luo B, Li X, Wang SQ, et al., 2021. Cost-effective federated learning design. IEEE Conf on Computer Communications, p.1-10.

[22]Nguyen VD, Sharma SK, Vu TX, et al., 2021. Efficient federated learning algorithm for resource allocation in wireless IoT networks. IEEE Int Things J, 8(5):3394-3409.

[23]Nori MK, Yun S, Kim IM, 2021. Fast federated learning by balancing communication trade-offs. IEEE Trans Commun, 69(8):5168-5182.

[24]Park J, Samarakoon S, Bennis M, et al., 2019. Wireless network intelligence at the edge. Proc IEEE, 107(11):2204-2239.

[25]Park J, Samarakoon S, Elgabli A, et al., 2021. Communication-efficient and distributed learning over wireless networks: principles and applications. Proc IEEE, 109(5):796-819.

[26]Razaviyayn M, 2014. Successive Convex Approximation: Analysis and Applications. PhD Thesis, University of Minnesota, Minnesota, USA.

[27]Reisizadeh A, Mokhtari A, Hassani H, et al., 2020. FedPAQ: a communication-efficient federated learning method with periodic averaging and quantization. Proc 23^rd Int Conf on Artificial Intelligence Statistics, p.2021-2031.

[28]Ren JK, He YH, Wen DZ, et al., 2020. Scheduling for cellular federated edge learning with importance and channel awareness. IEEE Trans Wirel Commun, 19(11):7690-7703.

[29]Salehi M, Hossain E, 2021. Federated learning in unreliable and resource-constrained cellular wireless networks. IEEE Trans Commun, 69(8):5136-5151.

[30]Shi SH, Chu XW, Cheung KC, et al., 2019. Understanding top-k sparsification in distributed deep learning. https://arxiv.org/abs/1911.08772v1

[31]Shlezinger N, Chen MZ, Eldar YC, et al., 2021. UVeQFed: universal vector quantization for federated learning. IEEE Trans Signal Process, 69:500-514.

[32]Stich SU, Cordonnier JB, Jaggi M, 2018. Sparsified SGD with memory. Proc 32^nd Int Conf on Neural Information Processing Systems, p.4452-4463.

[33]Tse D, Viswanath P, 2005. Fundamentals of Wireless Communication. Cambridge University Press, New York, USA.

[34]Wan S, Lu JX, Fan PY, et al., 2021. Convergence analysis and system design for federated learning over wireless networks. IEEE J Sel Areas Commun, 39(12):3622-3639.

[35]Wang SQ, Tuor T, Salonidis T, et al., 2019. Adaptive federated learning in resource constrained edge computing systems. IEEE J Sel Areas Commun, 37(6):1205-1221.

[36]Wang YM, Xu YQ, Shi QJ, et al., 2022. Quantized federated learning under transmission delay and outage constraints. IEEE J Sel Areas Commun, 40(1):323-341.

[37]Wangni JQ, Wang JL, Liu J, et al., 2018. Gradient sparsification for communication-efficient distributed optimization. https://arxiv.org/abs/1710.09854v1

[38]Yang ZH, Chen MZ, Saad W, et al., 2021. Energy efficient federated learning over wireless communication networks. IEEE Trans Wirel Commun, 20(3):1935-1949.

[39]Zhu GX, Wang Y, Huang KB, 2020a. Broadband analog aggregation for low-latency federated edge learning. IEEE Trans Wirel Commun, 19(1):491-506.

[40]Zhu GX, Liu DZ, Du YQ, et al., 2020b. Toward an intelligent edge: wireless communication meets machine learning. IEEE Commun Mag, 58(1):19-25.

[41]Zhu GX, Du YQ, Gündüz D, et al., 2021. One-bit over-the-air aggregation for communication-efficient federated edge learning: design and convergence analysis. IEEE Trans Wirel Commun, 20(3):2120-2135.

Open peer comments: Debate/Discuss/Question/Opinion

<1>

- Go to

基于联邦边缘学习的梯度量化和带宽分配优化策略

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference