Full Text:   <973>

Summary:  <379>

CLC number: 

On-line Access: 2022-01-24

Received: 2021-07-05

Revision Accepted: 2022-04-22

Crosschecked: 2021-11-15

Cited: 0

Clicked: 1811

Citations:  Bibtex RefMan EndNote GB/T7714

 ORCID:

Xiaoyu LIU

https://orcid.org/0000-0003-3293-0803

Chi XU

https://orcid.org/0000-0001-7389-5763

Haibin YU

https://orcid.org/0000-0002-1663-2956

-   Go to

Article info.
Open peer comments

Frontiers of Information Technology & Electronic Engineering  2022 Vol.23 No.1 P.47-60

http://doi.org/10.1631/FITEE.2100331


Multi-agent deep reinforcement learning for end–edge orchestrated resource allocation in industrial wireless networks


Author(s):  Xiaoyu LIU, Chi XU, Haibin YU, Peng ZENG

Affiliation(s):  State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, China; more

Corresponding email(s):   liuxiaoyu1@sia.cn, xuchi@sia.cn, yhb@sia.cn, zp@sia.cn

Key Words:  Multi-agent deep reinforcement learning, End–edge orchestrated, Industrial wireless networks, Delay, Energy consumption


Xiaoyu LIU, Chi XU, Haibin YU, Peng ZENG. Multi-agent deep reinforcement learning for end–edge orchestrated resource allocation in industrial wireless networks[J]. Frontiers of Information Technology & Electronic Engineering, 2022, 23(1): 47-60.

@article{title="Multi-agent deep reinforcement learning for end–edge orchestrated resource allocation in industrial wireless networks",
author="Xiaoyu LIU, Chi XU, Haibin YU, Peng ZENG",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="23",
number="1",
pages="47-60",
year="2022",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2100331"
}

%0 Journal Article
%T Multi-agent deep reinforcement learning for end–edge orchestrated resource allocation in industrial wireless networks
%A Xiaoyu LIU
%A Chi XU
%A Haibin YU
%A Peng ZENG
%J Frontiers of Information Technology & Electronic Engineering
%V 23
%N 1
%P 47-60
%@ 2095-9184
%D 2022
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2100331

TY - JOUR
T1 - Multi-agent deep reinforcement learning for end–edge orchestrated resource allocation in industrial wireless networks
A1 - Xiaoyu LIU
A1 - Chi XU
A1 - Haibin YU
A1 - Peng ZENG
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 23
IS - 1
SP - 47
EP - 60
%@ 2095-9184
Y1 - 2022
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2100331


Abstract: 
Edge artificial intelligence will empower the ever simple industrial wireless networks (IWNs) supporting complex and dynamic tasks by collaboratively exploiting the computation and communication resources of both machine-type devices (MTDs) and edge servers. In this paper, we propose a multi-agent deep reinforcement learning based resource allocation (MADRL-RA) algorithm for end–edge orchestrated IWNs to support computation-intensive and delay-sensitive applications. First, we present the system model of IWNs, wherein each MTD is regarded as a self-learning agent. Then, we apply the Markov decision process to formulate a minimum system overhead problem with joint optimization of delay and energy consumption. Next, we employ MADRL to defeat the explosive state space and learn an effective resource allocation policy with respect to computing decision, computation capacity, and transmission power. To break the time correlation of training data while accelerating the learning process of MADRL-RA, we design a weighted experience replay to store and sample experiences categorically. Furthermore, we propose a step-by-step ε-greedy method to balance exploitation and exploration. Finally, we verify the effectiveness of MADRL-RA by comparing it with some benchmark algorithms in many experiments, showing that MADRL-RA converges quickly and learns an effective resource allocation policy achieving the minimum system overhead.

基于多智能体深度强化学习的工业无线网络端边协同资源分配

刘晓宇1,2,3,4,许驰1,2,3,于海斌1,2,3,曾鹏1,2,3
1中国科学院沈阳自动化研究所机器人学国家重点实验室,中国沈阳市,110016
2中国科学院网络化控制系统重点实验室,中国沈阳市,110016
3中国科学院机器人与智能制造创新研究院,中国沈阳市,110169
4中国科学院大学,中国北京市,100049
摘要:边缘人工智能通过协同利用设备侧和边缘侧有限的网络、计算资源,赋能工业无线网络以支持复杂和动态工业任务。面向资源受限的工业无线网络,我们提出一种基于多智能体深度强化学习的资源分配(MADRL-RA)算法,实现了端边协同资源分配,支持计算密集型、时延敏感型工业应用。首先,建立了端边协同的工业无线网络系统模型,将具有感知能力的工业设备作为自学习的智能代理。然后,采用马尔可夫决策过程对端边资源分配问题进行形式化描述,建立关于时延和能耗联合优化的最小系统开销问题。接着,利用多智能体深度强化学习克服状态空间维灾,同时学习关于计算决策、算力分配和传输功率的有效资源分配策略。为了打破训练数据的时间相关性,同时加速MADRL-RA学习过程,设计了一种带经验权重的经验回放方法,对经验进行分类存储和采样。在此基础上,提出步进的ε-贪婪方法来平衡智能代理对经验的利用与探索。最后,通过大量对比实验,验证了MADRL-RA算法相较于多种基线算法的有效性。实验结果表明,MADRL-RA收敛速度快,能够学习到有效资源分配策略以实现最小系统开销。

关键词:多智能体深度强化学习;端边协同;工业无线网络;时延;能耗

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Alfakih T, Hassan MM, Gumaei A, et al., 2020. Task offloading and resource allocation for mobile edge computing by deep reinforcement learning based on SARSA. IEEE Access, 8:54074-54084. doi: 10.1109/access.2020.2981434

[2]Cao ZL, Zhou P, Li RX, et al., 2020. Multiagent deep reinforcement learning for joint multichannel access and task offloading of mobile-edge computing in Industry 4.0. IEEE Int Things J, 7(7):6201-6213. doi: 10.1109/jiot.2020.2968951

[3]Chen Y, Liu ZY, Zhang YC, et al., 2021. Deep reinforcement learning-based dynamic resource management for mobile edge computing in industrial Internet of Things. IEEE Trans Ind Inform, 17(7):4925-4934. doi: 10.1109/tii.2020.3028963

[4]Chu TS, Wang J, Codecà L, et al., 2020. Multi-agent deep reinforcement learning for large-scale traffic signal control. IEEE Trans Intell Transp Syst, 21(3):1086-1095. doi: 10.1109/tits.2019.2901791

[5]Dai YY, Zhang K, Maharjan S, et al., 2020. Edge intelligence for energy-efficient computation offloading and resource allocation in 5G beyond. IEEE Trans Veh Technol, 69(10):12175-12186. doi: 10.1109/tvt.2020.3013990

[6]Feng J, Pei QQ, Yu FR, et al., 2019. Computation offloading and resource allocation for wireless powered mobile edge computing with latency constraint. IEEE Wirel Commun Lett, 8(5):1320-1323. doi: 10.1109/lwc.2019.2915618

[7]Foerster JN, Assael YM, de Freitas N, et al., 2016. Learning to communicate with deep multi-agent reinforcement learning. Proc Advances in Neural Information Processing Systems 29, p.2137-2145.

[8]Guo JF, Song ZZ, Cui Y, et al., 2017. Energy-efficient resource allocation for multi-user mobile edge computing. Proc IEEE Global Communications Conf, p.1-7.

[9]He XM, Lu HD, Du M, et al., 2021. QoE-based task offloading with deep reinforcement learning in edge-enabled Internet of Vehicles. IEEE Trans Intell Transp Syst, 22(4):2252-2261. doi: 10.1109/tits.2020.3016002

[10]Kumar M, Sharma SC, Goel A, et al., 2019. A comprehensive survey for scheduling techniques in cloud computing. J Netw Comput Appl, 143:1-33.

[11]Li HL, Xu HT, Zhou CC, et al., 2020. Joint optimization strategy of computation offloading and resource allocation in multi-access edge computing environment. IEEE Trans Veh Technol, 69(9):10214-10226. doi: 10.1109/tvt.2020.3003898

[12]Lin CC, Deng DJ, Chih YL, et al., 2019. Smart manufacturing scheduling with edge computing using multiclass deep Q network. IEEE Trans Ind Inform, 15(7):4276-4284. doi: 10.1109/tii.2019.2908210

[13]Liu KH, Liao WJ, 2020. Intelligent offloading for multi-access edge computing: a new actor-critic approach. Proc IEEE Int Conf on Communications, p.1-6.

[14]Liu XY, Xu C, Yu HB, et al., 2021. Deep reinforcement learning-based multi-channel access for industrial wireless networks with dynamic multi-user priority. IEEE Trans Ind Inform, early access. doi: 10.1109/TII.2021.3139349

[15]Lowe R, Wu Y, Tamar A, et al., 2017. Multi-agent actor-critic for mixed cooperative-competitive environments. Proc Advances in Neural Information Processing Systems 30, p.6379-6390.

[16]Lu HD, He XM, Du M, et al., 2020. Edge QoE: computation offloading with deep reinforcement learning for Internet of Things. IEEE Int Things J, 7(10):9255-9265. doi: 10.1109/jiot.2020.2981557

[17]Naparstek O, Cohen K, 2019. Deep multi-user reinforcement learning for distributed dynamic spectrum access. IEEE Trans Wirel Commun, 18(1):310-323. doi: 10.1109/TWC.2018.2879433

[18]Porambage P, Okwuibe J, Liyanage M, et al., 2018. Survey on multi-access edge computing for Internet of Things realization. IEEE Commun Surv Tut, 20(4):2961-2991. doi: 10.1109/comst.2018.2849509

[19]Rashid T, Samvelyan M, de Witt CS, et al., 2018. QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. Proc 35th Int Conf on Machine Learning, p.4292-4301.

[20]Ren YJ, Sun YH, Peng MG, 2021. Deep reinforcement learning based computation offloading in fog enabled industrial Internet of Things. IEEE Trans Ind Inform, 17:4978-4987. doi: 10.1109/tii.2020.3021024

[21]Schaul T, Quan J, Antonoglou I, et al., 2016. Prioritized experience replay. Proc 4th Int Conf on Learning Representations.

[22]Shakarami A, Ghobaei-Arani M, Shahidinejad A, 2020. A survey on the computation offloading approaches in mobile edge computing: a machine learning-based perspective. Comput Netw, 182:107496. doi: 10.1016/j.comnet.2020.107496

[23]Tang L, He SB, 2018. Multi-user computation offloading in mobile edge computing: a behavioral perspective. IEEE Netw, 32(1):48-53. doi: 10.1109/mnet.2018.1700119

[24]Wang HN, Liu N, Zhang YY, et al., 2020. Deep reinforcement learning: a survey. Front Inform Technol Electron Eng, 21:1726-1744. doi: 10.1631/FITEE.1900533

[25]Wei YF, Yu FR, Song M, et al., 2019. Joint optimization of caching, computing, and radio resources for fog-enabled IoT using natural actor–critic deep reinforcement learning. IEEE Int Things J, 6(2):2061-2073. doi: 10.1109/jiot.2018.2878435

[26]Xiong X, Zheng K, Lei L, et al., 2020. Resource allocation based on deep reinforcement learning in IoT edge computing. IEEE J Sel Area Commun, 38(6):1133-1146. doi: 10.1109/jsac.2020.2986615

[27]Xu C, Zeng P, Yu HB, et al., 2021. WIA-NR: ultra-reliable low-latency communication for industrial wireless control networks over unlicensed bands. IEEE Netw, 35(1):258-265. doi: 10.1109/mnet.011.2000308

[28]Yao XF, Zhou JJ, Lin YZ, et al., 2019. Smart manufacturing based on cyber-physical systems and beyond. J Intell Manuf, 30(8):2805-2817. doi: 10.1007/s10845-017-1384-5

[29]Yu HB, Zeng P, Xu C, 2021. Industrial wireless control networks: from WIA to the future. Engineering, early access. doi: 10.1016/j.eng.2021.06.024

[30]Zhang GL, Zhang WQ, Cao Y, et al., 2018. Energy-delay tradeoff for dynamic offloading in mobile-edge computing system with energy harvesting devices. IEEE Trans Ind Inform, 14(10):4642-4655. doi: 10.1109/tii.2018.2843365

[31]Zhang KQ, Yang ZR, Basar T, 2021. Decentralized multi-agent reinforcement learning with networked agents: recent advances. Front Inform Technol Electron Eng, 22:802-814. doi: 10.1631/FITEE.1900661

[32]Zhang P, Peng MG, Cui SG, et al., 2022. Theory and techniques for "intellicise" wireless networks. Front Inform Technol Electron Eng, 23(1):1-4. doi: 10.1631/FITEE.2210000

[33]Zhang YM, Lan XL, Ren J, et al., 2020. Efficient computing resource sharing for mobile edge-cloud computing networks. IEEE/ACM Trans Netw, 28(3):1227-1240. doi: 10.1109/tnet.2020.2979807

[34]Zhu XY, Luo YY, Liu AF, et al., 2021. Multiagent deep reinforcement learning for vehicular computation offloading in IoT. IEEE Int Things J, 8(12):9763-9773. doi: 10.1109/jiot.2020.3040768

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - 2022 Journal of Zhejiang University-SCIENCE