JZUS - Journal of Zhejiang University SCIENCE

Frontiers of Information Technology & Electronic Engineering

ISSN 2095-9184 (print), ISSN 2095-9230 (online)

2024 Vol.25 No.6 P.869-886

Multi-agent reinforcement learning behavioral control for nonlinear second-order systems

Zhenyi ZHANG, Jie HUANG, Congjie PAN

College of Electrical Engineering and Automation, Fuzhou University, Fuzhou 350108, China; 5G+ Industrial Internet Institute, Fuzhou University, Fuzhou 350108, China

jie.huang@fzu.edu.cn

Abstract: Reinforcement learning behavioral control (RLBC) is limited to an individual agent without any swarm mission, because it models the behavior priority learning as a Markov decision process. In this paper, a novel multi-agent reinforcement learning behavioral control (MARLBC) method is proposed to overcome such limitations by implementing joint learning. Specifically, a multi-agent reinforcement learning mission supervisor (MARLMS) is designed for a group of nonlinear second-order systems to assign the behavior priorities at the decision layer. Through modeling behavior priority switching as a cooperative Markov game, the MARLMS learns an optimal joint behavior priority to reduce dependence on human intelligence and high-performance computing hardware. At the control layer, a group of second-order reinforcement learning controllers are designed to learn the optimal control policies to track position and velocity signals simultaneously. In particular, input saturation constraints are strictly implemented via designing a group of adaptive compensators. Numerical simulation results show that the proposed MARLBC has a lower switching frequency and control cost than finite-time and fixed-time behavioral control and RLBC methods.

Key words: Reinforcement learning; Behavioral control; Second-order systems; Mission supervisor

Chinese Summary <22> 非线性二阶系统的多智能体强化学习行为控制

张祯毅^1,2，黄捷^1,2，潘聪捷^1,2
¹福州大学电气工程与自动化学院，中国福州市，350108
²福州大学5G+工业互联网研究院，中国福州市，350108
摘要：强化学习行为控制局限于没有群体任务的单个智能体，因为其将行为优先级学习建模为马尔可夫决策过程。本文提出一种新颖的多智能体强化学习行为控制方法，该方法通过执行联合学习克服上述缺陷。具体而言，针对一组非线性二阶系统，设计一个多智能体强化学习任务监管器以在任务层分配行为优先级。通过将行为优先级切换建模为协作式马尔可夫博弈，多智能体强化学习任务监管器学习最优联合行为优先级，以减少对人类智能和高性能计算硬件的依赖。在控制层，设计了一组二阶强化学习控制器用以学习最优控制策略，实现位置和速度信号的同步跟踪。特别地，设计了一组自适应补偿器以保证输入饱和约束。数值仿真结果验证了所提出的多智能体强化学习行为控制对比有限时间、固有时间和强化学习行为控制具有更低的切换频率和控制代价。

关键词组：强化学习；行为控制；二阶系统；任务监管

Share this article to： More

Go to Contents

References:

Open peer comments: Debate/Discuss/Question/Opinion

<1>

DOI:

10.1631/FITEE.2300394

CLC number:

TP18

Download Full Text:

Click Here

Downloaded:

1682

Download summary:

Downloaded:

524

Clicked:

2184

Cited:

On-line Access:

2024-08-27

Received:

2023-10-17

Revision Accepted:

2024-05-08

Crosschecked:

2023-11-16

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952276; Fax: +86-571-87952331; E-mail: jzus@zju.edu.cn
Copyright © 2000~ Journal of Zhejiang University-SCIENCE

CONTENTS

INSTR. FOR AUTHOR

FOR REVIEWER

ABOUT JZUS

Publishing Service