JZUS - Journal of Zhejiang University SCIENCE

Frontiers of Information Technology & Electronic Engineering

ISSN 2095-9184 (print), ISSN 2095-9230 (online)

2022 Vol.23 No.1 P.47-60

Multi-agent deep reinforcement learning for end�edge orchestrated resource allocation in industrial wireless networks

Xiaoyu LIU, Chi XU, Haibin YU, Peng ZENG

State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, China; Key Laboratory of Networked Control Systems, Chinese Academy of Sciences, Shenyang 110016, China; Institutes for Robotics and Intelligent Manufacturing, Chinese Academy of Sciences, Shenyang 110169, China; University of Chinese Academy of Sciences, Beijing 100049, China

liuxiaoyu1@sia.cn, xuchi@sia.cn, yhb@sia.cn, zp@sia.cn

Abstract: Edge artificial intelligence will empower the ever simple industrial wireless networks (IWNs) supporting complex and dynamic tasks by collaboratively exploiting the computation and communication resources of both machine-type devices (MTDs) and edge servers. In this paper, we propose a multi-agent deep reinforcement learning based resource allocation (MADRL-RA) algorithm for end�edge orchestrated IWNs to support computation-intensive and delay-sensitive applications. First, we present the system model of IWNs, wherein each MTD is regarded as a self-learning agent. Then, we apply the Markov decision process to formulate a minimum system overhead problem with joint optimization of delay and energy consumption. Next, we employ MADRL to defeat the explosive state space and learn an effective resource allocation policy with respect to computing decision, computation capacity, and transmission power. To break the time correlation of training data while accelerating the learning process of MADRL-RA, we design a weighted experience replay to store and sample experiences categorically. Furthermore, we propose a step-by-step ?-greedy method to balance exploitation and exploration. Finally, we verify the effectiveness of MADRL-RA by comparing it with some benchmark algorithms in many experiments, showing that MADRL-RA converges quickly and learns an effective resource allocation policy achieving the minimum system overhead.

Key words: Multi-agent deep reinforcement learning; End�edge orchestrated; Industrial wireless networks; Delay; Energy consumption

Chinese Summary <58> 基于多智能体深度强化学习的工业无线网络端边协同资源分配

刘晓宇^1,2,3,4，许驰^1,2,3，于海斌^1,2,3，曾鹏^1,2,3
¹中国科学院沈阳自动化研究所机器人学国家重点实验室，中国沈阳市，110016
²中国科学院网络化控制系统重点实验室，中国沈阳市，110016
³中国科学院机器人与智能制造创新研究院，中国沈阳市，110169
⁴中国科学院大学，中国北京市，100049
摘要：边缘人工智能通过协同利用设备侧和边缘侧有限的网络、计算资源，赋能工业无线网络以支持复杂和动态工业任务。面向资源受限的工业无线网络，我们提出一种基于多智能体深度强化学习的资源分配（MADRL-RA）算法，实现了端边协同资源分配，支持计算密集型、时延敏感型工业应用。首先，建立了端边协同的工业无线网络系统模型，将具有感知能力的工业设备作为自学习的智能代理。然后，采用马尔可夫决策过程对端边资源分配问题进行形式化描述，建立关于时延和能耗联合优化的最小系统开销问题。接着，利用多智能体深度强化学习克服状态空间维灾，同时学习关于计算决策、算力分配和传输功率的有效资源分配策略。为了打破训练数据的时间相关性，同时加速MADRL-RA学习过程，设计了一种带经验权重的经验回放方法，对经验进行分类存储和采样。在此基础上，提出步进的ε-贪婪方法来平衡智能代理对经验的利用与探索。最后，通过大量对比实验，验证了MADRL-RA算法相较于多种基线算法的有效性。实验结果表明，MADRL-RA收敛速度快，能够学习到有效资源分配策略以实现最小系统开销。

关键词组：多智能体深度强化学习；端边协同；工业无线网络；时延；能耗

Share this article to： More

Go to Contents

References:

Open peer comments: Debate/Discuss/Question/Opinion

<1>

DOI:

10.1631/FITEE.2100331

CLC number:

Download Full Text:

Click Here

Downloaded:

12970

Download summary:

Downloaded:

2074

Clicked:

7210

Cited:

On-line Access:

2024-08-27

Received:

2023-10-17

Revision Accepted:

2024-05-08

Crosschecked:

2021-11-15

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952276; Fax: +86-571-87952331; E-mail: jzus@zju.edu.cn
Copyright © 2000~ Journal of Zhejiang University-SCIENCE

CONTENTS

INSTR. FOR AUTHOR

FOR REVIEWER

ABOUT JZUS

Publishing Service