CLC number: TP18
On-line Access: 2024-08-27
Received: 2023-10-17
Revision Accepted: 2024-05-08
Crosschecked: 2023-11-16
Cited: 0
Clicked: 1045
Citations: Bibtex RefMan EndNote GB/T7714
Zhenyi ZHANG, Jie HUANG, Congjie PAN. Multi-agent reinforcement learning behavioral control for nonlinear second-order systems[J]. Frontiers of Information Technology & Electronic Engineering, 2024, 25(6): 869-886.
@article{title="Multi-agent reinforcement learning behavioral control for nonlinear second-order systems",
author="Zhenyi ZHANG, Jie HUANG, Congjie PAN",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="25",
number="6",
pages="869-886",
year="2024",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2300394"
}
%0 Journal Article
%T Multi-agent reinforcement learning behavioral control for nonlinear second-order systems
%A Zhenyi ZHANG
%A Jie HUANG
%A Congjie PAN
%J Frontiers of Information Technology & Electronic Engineering
%V 25
%N 6
%P 869-886
%@ 2095-9184
%D 2024
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2300394
TY - JOUR
T1 - Multi-agent reinforcement learning behavioral control for nonlinear second-order systems
A1 - Zhenyi ZHANG
A1 - Jie HUANG
A1 - Congjie PAN
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 25
IS - 6
SP - 869
EP - 886
%@ 2095-9184
Y1 - 2024
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2300394
Abstract: reinforcement learning behavioral control (RLBC) is limited to an individual agent without any swarm mission, because it models the behavior priority learning as a Markov decision process. In this paper, a novel multi-agent reinforcement learning behavioral control (MARLBC) method is proposed to overcome such limitations by implementing joint learning. Specifically, a multi-agent reinforcement learning mission supervisor (MARLMS) is designed for a group of nonlinear second-order systems to assign the behavior priorities at the decision layer. Through modeling behavior priority switching as a cooperative Markov game, the MARLMS learns an optimal joint behavior priority to reduce dependence on human intelligence and high-performance computing hardware. At the control layer, a group of second-order reinforcement learning controllers are designed to learn the optimal control policies to track position and velocity signals simultaneously. In particular, input saturation constraints are strictly implemented via designing a group of adaptive compensators. Numerical simulation results show that the proposed MARLBC has a lower switching frequency and control cost than finite-time and fixed-time behavioral control and RLBC methods.
[1]Ahmad S, Feng Z, Hu GQ, 2014. Multi-robot formation control using distributed null space behavioral approach. IEEE Int Conf on Robotics and Automation, p.3607-3612.
[2]Anschel O, Baram N, Shimkin N, 2017. Averaged-DQN: variance reduction and stabilization for deep reinforcement learning. Proc 34th Int Conf on Machine Learning, p.176-185.
[3]Antonelli G, Chiaverini S, 2006. Kinematic control of platoons of autonomous vehicles. IEEE Trans Robot, 22(6):1285-1292.
[4]Arkin RC, 1989. Motor schema-based mobile robot navigation. Int J Robot Res, 8(4):92-112.
[5]Balch T, Arkin RC, 1998. Behavior-based formation control for multirobot teams. IEEE Trans Robot Autom, 14(6):926-939.
[6]Brooks RA, 1986. A robust layered control system for a mobile robot. IEEE J Robot Autom, 2(1):14-23.
[7]Brooks RA, 1991. New approaches to robotics. Science, 253(5025):1227-1232.
[8]Cao SJ, Sun L, Jiang JJ, et al., 2023. Reinforcement learning-based fixed-time trajectory tracking control for uncertain robotic manipulators with input saturation. IEEE Trans Neur Netw Learn Syst, 34(8):4584-4595.
[9]Cao YC, Yu WW, Ren W, et al., 2013. An overview of recent progress in the study of distributed multi-agent coordination. IEEE Trans Ind Inform, 9(1):427-438.
[10]Chen J, Gan MG, Huang J, et al., 2016. Formation control of multiple Euler–Lagrange systems via null-space-based behavioral control. Sci China Inform Sci, 59(1):1-11.
[11]Chen YT, Zhang ZY, Huang J, 2020. Dynamic task priority planning for null-space behavioral control of multi-agent systems. IEEE Access, 8:149643-149651.
[12]Dong XW, Zhou Y, Ren Z, et al., 2017. Time-varying formation tracking for second-order multi-agent systems subjected to switching topologies with application to quadrotor formation flying. IEEE Trans Ind Electron, 64(6):5014-5024.
[13]Garattoni L, Birattari M, 2018. Autonomous task sequencing in a robot swarm. Sci Robot, 3(20):eaat0430.
[14]Huang J, Cao M, Zhou N, et al., 2017. Distributed behavioral control for second-order nonlinear multi-agent systems. IFAC-PapersOnLine, 50(1):2445-2450.
[15]Huang J, Zhou N, Cao M, 2019. Adaptive fuzzy behavioral control of second-order autonomous agents with prioritized missions: theory and experiments. IEEE Trans Ind Electron, 66(12):9612-9622.
[16]Huang J, Mo ZB, Zhang ZY, et al., 2022a. Behavioral control task supervisor with memory based on reinforcement learning for human–multi-robot coordination systems. Front Inform Technol Electron Eng, 23(8):1174-1188.
[17]Huang J, Wu WH, Zhang ZY, et al., 2022b. Human decision-making modeling and cooperative controller design for human–agent interaction systems. IEEE Trans Human-Mach Syst, 52(6):1122-1134.
[18]Littman ML, 1994. Markov games as a framework for multi-agent reinforcement learning. Proc 11th Int Conf on Machine Learning, p.157-163.
[19]Liu DR, Xue S, Zhao B, et al., 2021. Adaptive dynamic programming for control: a survey and recent advances. IEEE Trans Syst Man Cybern Syst, 51(1):142-160.
[20]Liu Y, Li HY, Lu RQ, et al., 2022. An overview of finite/fixed-time control and its application in engineering systems. IEEE/CAA J Autom Sin, 9(12):2106-2120.
[21]Marino A, Caccavale F, Parker LE, et al., 2009. Fuzzy behavioral control for multi-robot border patrol. Proc 17th Mediterranean Conf on Control and Automation, p.246-251.
[22]Marino A, Parker LE, Antonelli G, et al., 2013. A decentralized architecture for multi-robot systems based on the null-space-behavioral control with application to multi-robot border patrolling. J Intell Robot Syst, 71(3):423-444.
[23]Ott C, Dietrich A, Albu-Schäffer A, 2015. Prioritized multi-task compliance control of redundant manipulators. Automatica, 53:416-423.
[24]Santos MCP, Rosales CD, Sarcinelli-Filho M, et al., 2017. A novel null-space-based UAV trajectory tracking controller with collision avoidance. IEEE/ASME Trans Mech, 22(6):2543-2553.
[25]Schlanbusch R, Kristiansen R, Nicklasson PJ, 2011. Spacecraft formation reconfiguration with collision avoidance. Automatica, 47(7):1443-1449.
[26]Vadakkepat P, Miin OC, Peng X, et al., 2004. Fuzzy behavior-based control of mobile robots. IEEE Trans Fuzzy Syst, 12(4):559-565.
[27]Wang WJ, Li CJ, Guo YN, 2021. Relative position coordinated control for spacecraft formation flying with obstacle/collision avoidance. Nonl Dyn, 104(2):1329-1342.
[28]Wang ZY, Schaul T, Hessel M, et al., 2016. Dueling network architectures for deep reinforcement learning. Proc 33rd Int Conf on Machine Learning, p.1995-2003.
[29]Wei EM, Luke S, 2016. Lenient learning in independent-learner stochastic cooperative games. J Mach Learn Res, 17(1):2914-2955.
[30]Wen GX, Chen CLP, Liu YJ, et al., 2017. Neural network-based adaptive leader-following consensus control for a class of nonlinear multiagent state-delay systems. IEEE Trans Cybern, 47(8):2151-2160.
[31]Wen GX, Chen CLP, Feng J, et al., 2018. Optimized multi-agent formation control based on an identifier–actor–critic reinforcement learning algorithm. IEEE Trans Fuzzy Syst, 26(5):2719-2731.
[32]Wen GX, Chen CLP, Ge SS, 2021. Simplified optimized backstepping control for a class of nonlinear strict-feedback systems with unknown dynamic functions. IEEE Trans Cybern, 51(9):4567-4580.
[33]Yao DY, Li HY, Lu RQ, et al., 2020. Distributed sliding-mode tracking control of second-order nonlinear multi-agent systems: an event-triggered approach. IEEE Trans Cybern, 50(9):3892-3902.
[34]Yao P, Wei YX, Zhao ZY, 2022. Null-space-based modulated reference trajectory generator for multi-robots formation in obstacle environment. ISA Trans, 123:168-178.
[35]Zhang ZY, Mo ZB, Chen YT, et al., 2022. Reinforcement learning behavioral control for nonlinear autonomous system. IEEE/CAA J Autom Sin, 9(9):1561-1573.
[36]Zheng CB, Pang ZH, Wang JX, et al., 2023. Null-space-based time-varying formation control of uncertain nonlinear second-order multiagent systems with collision avoidance. IEEE Trans Ind Electron, 70(10):10476-10485.
[37]Zhou N, Xia YQ, Wang ML, et al., 2015. Finite-time attitude control of multiple rigid spacecraft using terminal sliding mode. Int J Robust Nonl Contr, 25(12):1862-1876.
[38]Zhou N, Cheng XD, Sun ZQ, et al., 2022. Fixed-time cooperative behavioral control for networked autonomous agents with second-order nonlinear dynamics. IEEE Trans Cybern, 52(9):9504-9518.
Open peer comments: Debate/Discuss/Question/Opinion
<1>