Publishing Service

Polishing & Checking

Frontiers of Information Technology & Electronic Engineering

ISSN 2095-9184 (print), ISSN 2095-9230 (online)

Multi-agent deep reinforcement learning for end–edge orchestrated resource allocation in industrial wireless networks

Abstract: Edge artificial intelligence will empower the ever simple industrial wireless networks (IWNs) supporting complex and dynamic tasks by collaboratively exploiting the computation and communication resources of both machine-type devices (MTDs) and edge servers. In this paper, we propose a multi-agent deep reinforcement learning based resource allocation (MADRL-RA) algorithm for end–edge orchestrated IWNs to support computation-intensive and delay-sensitive applications. First, we present the system model of IWNs, wherein each MTD is regarded as a self-learning agent. Then, we apply the Markov decision process to formulate a minimum system overhead problem with joint optimization of delay and energy consumption. Next, we employ MADRL to defeat the explosive state space and learn an effective resource allocation policy with respect to computing decision, computation capacity, and transmission power. To break the time correlation of training data while accelerating the learning process of MADRL-RA, we design a weighted experience replay to store and sample experiences categorically. Furthermore, we propose a step-by-step ε-greedy method to balance exploitation and exploration. Finally, we verify the effectiveness of MADRL-RA by comparing it with some benchmark algorithms in many experiments, showing that MADRL-RA converges quickly and learns an effective resource allocation policy achieving the minimum system overhead.

Key words: Multi-agent deep reinforcement learning; End–edge orchestrated; Industrial wireless networks; Delay; Energy consumption

Chinese Summary  <34> 基于多智能体深度强化学习的工业无线网络端边协同资源分配

刘晓宇1,2,3,4,许驰1,2,3,于海斌1,2,3,曾鹏1,2,3
1中国科学院沈阳自动化研究所机器人学国家重点实验室,中国沈阳市,110016
2中国科学院网络化控制系统重点实验室,中国沈阳市,110016
3中国科学院机器人与智能制造创新研究院,中国沈阳市,110169
4中国科学院大学,中国北京市,100049
摘要:边缘人工智能通过协同利用设备侧和边缘侧有限的网络、计算资源,赋能工业无线网络以支持复杂和动态工业任务。面向资源受限的工业无线网络,我们提出一种基于多智能体深度强化学习的资源分配(MADRL-RA)算法,实现了端边协同资源分配,支持计算密集型、时延敏感型工业应用。首先,建立了端边协同的工业无线网络系统模型,将具有感知能力的工业设备作为自学习的智能代理。然后,采用马尔可夫决策过程对端边资源分配问题进行形式化描述,建立关于时延和能耗联合优化的最小系统开销问题。接着,利用多智能体深度强化学习克服状态空间维灾,同时学习关于计算决策、算力分配和传输功率的有效资源分配策略。为了打破训练数据的时间相关性,同时加速MADRL-RA学习过程,设计了一种带经验权重的经验回放方法,对经验进行分类存储和采样。在此基础上,提出步进的ε-贪婪方法来平衡智能代理对经验的利用与探索。最后,通过大量对比实验,验证了MADRL-RA算法相较于多种基线算法的有效性。实验结果表明,MADRL-RA收敛速度快,能够学习到有效资源分配策略以实现最小系统开销。

关键词组:多智能体深度强化学习;端边协同;工业无线网络;时延;能耗


Share this article to: More

Go to Contents

References:

<Show All>

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





DOI:

10.1631/FITEE.2100331

CLC number:

Download Full Text:

Click Here

Downloaded:

7326

Download summary:

<Click Here> 

Downloaded:

798

Clicked:

4300

Cited:

0

On-line Access:

2022-01-24

Received:

2021-07-05

Revision Accepted:

2022-04-22

Crosschecked:

2021-11-15

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952276; Fax: +86-571-87952331; E-mail: jzus@zju.edu.cn
Copyright © 2000~ Journal of Zhejiang University-SCIENCE