JZUS - Journal of Zhejiang University SCIENCE

ENGINEERING Information Technology & Electronic Engineering

Accepted manuscript available online (unedited version)

Multi-agent reinforcement learning behavioral control for nonlinear second-order systems

Author(s): Zhenyi ZHANG, Jie HUANG, Congjie PAN
Affiliation(s): College of Electrical Engineering and Automation, Fuzhou University, Fuzhou 350108, China; more
Corresponding email(s): jie.huang@fzu.edu.cn
Key Words: Reinforcement learning; Behavioral control; Second-order systems; Mission supervisor

Share this article to： More <<< Previous Paper \|Next Paper >>>

Zhenyi ZHANG, Jie HUANG, Congjie PAN. Multi-agent reinforcement learning behavioral control for nonlinear second-order systems[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.2300394

@article{title="Multi-agent reinforcement learning behavioral control for nonlinear second-order systems",
author="Zhenyi ZHANG, Jie HUANG, Congjie PAN",
journal="Frontiers of Information Technology & Electronic Engineering",
year="in press",
publisher="Zhejiang University Press & Springer",
doi="https://doi.org/10.1631/FITEE.2300394"
}

%0 Journal Article
%T Multi-agent reinforcement learning behavioral control for nonlinear second-order systems
%A Zhenyi ZHANG
%A Jie HUANG
%A Congjie PAN
%J Frontiers of Information Technology & Electronic Engineering
%P 869-886
%@ 2095-9184
%D in press
%I Zhejiang University Press & Springer
doi="https://doi.org/10.1631/FITEE.2300394"

TY - JOUR
T1 - Multi-agent reinforcement learning behavioral control for nonlinear second-order systems
A1 - Zhenyi ZHANG
A1 - Jie HUANG
A1 - Congjie PAN
J0 - Frontiers of Information Technology & Electronic Engineering
SP - 869
EP - 886
%@ 2095-9184
Y1 - in press
PB - Zhejiang University Press & Springer
ER -
doi="https://doi.org/10.1631/FITEE.2300394"

Abstract
Chinese Summary
Academic Network
Reviewer Comment

Abstract: Reinforcement learning behavioral control (RLBC) is limited to an individual agent without any swarm mission, because it models the behavior priority learning as a Markov decision process. In this paper, a novel multi-agent reinforcement learning behavioral control (MARLBC) method is proposed to overcome such limitations by implementing joint learning. Specifically, a multi-agent reinforcement learning mission supervisor (MARLMS) is designed for a group of nonlinear second-order systems to assign the behavior priorities at the decision layer. Through modeling behavior priority switching as a cooperative Markov game, the MARLMS learns an optimal joint behavior priority to reduce dependence on human intelligence and high-performance computing hardware. At the control layer, a group of second-order reinforcement learning controllers are designed to learn the optimal control policies to track position and velocity signals simultaneously. In particular, input saturation constraints are strictly implemented via designing a group of adaptive compensators. Numerical simulation results show that the proposed MARLBC has a lower switching frequency and control cost than finite-time and fixed-time behavioral control and RLBC methods.

非线性二阶系统的多智能体强化学习行为控制

张祯毅^1,2，黄捷^1,2，潘聪捷^1,2
¹福州大学电气工程与自动化学院，中国福州市，350108
²福州大学5G+工业互联网研究院，中国福州市，350108
摘要：强化学习行为控制局限于没有群体任务的单个智能体，因为其将行为优先级学习建模为马尔可夫决策过程。本文提出一种新颖的多智能体强化学习行为控制方法，该方法通过执行联合学习克服上述缺陷。具体而言，针对一组非线性二阶系统，设计一个多智能体强化学习任务监管器以在任务层分配行为优先级。通过将行为优先级切换建模为协作式马尔可夫博弈，多智能体强化学习任务监管器学习最优联合行为优先级，以减少对人类智能和高性能计算硬件的依赖。在控制层，设计了一组二阶强化学习控制器用以学习最优控制策略，实现位置和速度信号的同步跟踪。特别地，设计了一组自适应补偿器以保证输入饱和约束。数值仿真结果验证了所提出的多智能体强化学习行为控制对比有限时间、固有时间和强化学习行为控制具有更低的切换频率和控制代价。

关键词组：强化学习；行为控制；二阶系统；任务监管

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Ahmad S, Feng Z, Hu GQ, 2014. Multi-robot formation control using distributed null space behavioral approach. IEEE Int Conf on Robotics and Automation, p.3607-3612.

[2]Anschel O, Baram N, Shimkin N, 2017. Averaged-DQN: variance reduction and stabilization for deep reinforcement learning. Proc 34^th Int Conf on Machine Learning, p.176-185.

[3]Antonelli G, Chiaverini S, 2006. Kinematic control of platoons of autonomous vehicles. IEEE Trans Robot, 22(6):1285-1292.

[4]Arkin RC, 1989. Motor schema-based mobile robot navigation. Int J Robot Res, 8(4):92-112.

[5]Balch T, Arkin RC, 1998. Behavior-based formation control for multirobot teams. IEEE Trans Robot Autom, 14(6):926-939.

[6]Brooks RA, 1986. A robust layered control system for a mobile robot. IEEE J Robot Autom, 2(1):14-23.

[7]Brooks RA, 1991. New approaches to robotics. Science, 253(5025):1227-1232.

[8]Cao SJ, Sun L, Jiang JJ, et al., 2023. Reinforcement learning-based fixed-time trajectory tracking control for uncertain robotic manipulators with input saturation. IEEE Trans Neur Netw Learn Syst, 34(8):4584-4595.

[9]Cao YC, Yu WW, Ren W, et al., 2013. An overview of recent progress in the study of distributed multi-agent coordination. IEEE Trans Ind Inform, 9(1):427-438.

[10]Chen J, Gan MG, Huang J, et al., 2016. Formation control of multiple Euler–Lagrange systems via null-space-based behavioral control. Sci China Inform Sci, 59(1):1-11.

[11]Chen YT, Zhang ZY, Huang J, 2020. Dynamic task priority planning for null-space behavioral control of multi-agent systems. IEEE Access, 8:149643-149651.

[12]Dong XW, Zhou Y, Ren Z, et al., 2017. Time-varying formation tracking for second-order multi-agent systems subjected to switching topologies with application to quadrotor formation flying. IEEE Trans Ind Electron, 64(6):5014-5024.

[13]Garattoni L, Birattari M, 2018. Autonomous task sequencing in a robot swarm. Sci Robot, 3(20):eaat0430.

[14]Huang J, Cao M, Zhou N, et al., 2017. Distributed behavioral control for second-order nonlinear multi-agent systems. IFAC-PapersOnLine, 50(1):2445-2450.

[15]Huang J, Zhou N, Cao M, 2019. Adaptive fuzzy behavioral control of second-order autonomous agents with prioritized missions: theory and experiments. IEEE Trans Ind Electron, 66(12):9612-9622.

[16]Huang J, Mo ZB, Zhang ZY, et al., 2022a. Behavioral control task supervisor with memory based on reinforcement learning for human–multi-robot coordination systems. Front Inform Technol Electron Eng, 23(8):1174-1188.

[17]Huang J, Wu WH, Zhang ZY, et al., 2022b. Human decision-making modeling and cooperative controller design for human–agent interaction systems. IEEE Trans Human-Mach Syst, 52(6):1122-1134.

[18]Littman ML, 1994. Markov games as a framework for multi-agent reinforcement learning. Proc 11^th Int Conf on Machine Learning, p.157-163.

[19]Liu DR, Xue S, Zhao B, et al., 2021. Adaptive dynamic programming for control: a survey and recent advances. IEEE Trans Syst Man Cybern Syst, 51(1):142-160.

[20]Liu Y, Li HY, Lu RQ, et al., 2022. An overview of finite/fixed-time control and its application in engineering systems. IEEE/CAA J Autom Sin, 9(12):2106-2120.

[21]Marino A, Caccavale F, Parker LE, et al., 2009. Fuzzy behavioral control for multi-robot border patrol. Proc 17^th Mediterranean Conf on Control and Automation, p.246-251.

[22]Marino A, Parker LE, Antonelli G, et al., 2013. A decentralized architecture for multi-robot systems based on the null-space-behavioral control with application to multi-robot border patrolling. J Intell Robot Syst, 71(3):423-444.

[23]Ott C, Dietrich A, Albu-Schäffer A, 2015. Prioritized multi-task compliance control of redundant manipulators. Automatica, 53:416-423.

[24]Santos MCP, Rosales CD, Sarcinelli-Filho M, et al., 2017. A novel null-space-based UAV trajectory tracking controller with collision avoidance. IEEE/ASME Trans Mech, 22(6):2543-2553.

[25]Schlanbusch R, Kristiansen R, Nicklasson PJ, 2011. Spacecraft formation reconfiguration with collision avoidance. Automatica, 47(7):1443-1449.

[26]Vadakkepat P, Miin OC, Peng X, et al., 2004. Fuzzy behavior-based control of mobile robots. IEEE Trans Fuzzy Syst, 12(4):559-565.

[27]Wang WJ, Li CJ, Guo YN, 2021. Relative position coordinated control for spacecraft formation flying with obstacle/collision avoidance. Nonl Dyn, 104(2):1329-1342.

[28]Wang ZY, Schaul T, Hessel M, et al., 2016. Dueling network architectures for deep reinforcement learning. Proc 33^rd Int Conf on Machine Learning, p.1995-2003.

[29]Wei EM, Luke S, 2016. Lenient learning in independent-learner stochastic cooperative games. J Mach Learn Res, 17(1):2914-2955.

[30]Wen GX, Chen CLP, Liu YJ, et al., 2017. Neural network-based adaptive leader-following consensus control for a class of nonlinear multiagent state-delay systems. IEEE Trans Cybern, 47(8):2151-2160.

[31]Wen GX, Chen CLP, Feng J, et al., 2018. Optimized multi-agent formation control based on an identifier–actor–critic reinforcement learning algorithm. IEEE Trans Fuzzy Syst, 26(5):2719-2731.

[32]Wen GX, Chen CLP, Ge SS, 2021. Simplified optimized backstepping control for a class of nonlinear strict-feedback systems with unknown dynamic functions. IEEE Trans Cybern, 51(9):4567-4580.

[33]Yao DY, Li HY, Lu RQ, et al., 2020. Distributed sliding-mode tracking control of second-order nonlinear multi-agent systems: an event-triggered approach. IEEE Trans Cybern, 50(9):3892-3902.

[34]Yao P, Wei YX, Zhao ZY, 2022. Null-space-based modulated reference trajectory generator for multi-robots formation in obstacle environment. ISA Trans, 123:168-178.

[35]Zhang ZY, Mo ZB, Chen YT, et al., 2022. Reinforcement learning behavioral control for nonlinear autonomous system. IEEE/CAA J Autom Sin, 9(9):1561-1573.

[36]Zheng CB, Pang ZH, Wang JX, et al., 2023. Null-space-based time-varying formation control of uncertain nonlinear second-order multiagent systems with collision avoidance. IEEE Trans Ind Electron, 70(10):10476-10485.

[37]Zhou N, Xia YQ, Wang ML, et al., 2015. Finite-time attitude control of multiple rigid spacecraft using terminal sliding mode. Int J Robust Nonl Contr, 25(12):1862-1876.

[38]Zhou N, Cheng XD, Sun ZQ, et al., 2022. Fixed-time cooperative behavioral control for networked autonomous agents with second-order nonlinear dynamics. IEEE Trans Cybern, 52(9):9504-9518.

Open peer comments: Debate/Discuss/Question/Opinion

<1>

- Go to

非线性二阶系统的多智能体强化学习行为控制

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference