
CLC number: TP18
On-line Access: 2024-08-27
Received: 2023-10-17
Revision Accepted: 2024-05-08
Crosschecked: 2023-11-16
Cited: 0
Clicked: 2795
Citations: Bibtex RefMan EndNote GB/T7714
Zhenyi ZHANG, Jie HUANG, Congjie PAN. Multi-agent reinforcement learning behavioral control for nonlinear second-order systems[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.2300394 @article{title="Multi-agent reinforcement learning behavioral control for nonlinear second-order systems", %0 Journal Article TY - JOUR
非线性二阶系统的多智能体强化学习行为控制1福州大学电气工程与自动化学院,中国福州市,350108 2福州大学5G+工业互联网研究院,中国福州市,350108 摘要:强化学习行为控制局限于没有群体任务的单个智能体,因为其将行为优先级学习建模为马尔可夫决策过程。本文提出一种新颖的多智能体强化学习行为控制方法,该方法通过执行联合学习克服上述缺陷。具体而言,针对一组非线性二阶系统,设计一个多智能体强化学习任务监管器以在任务层分配行为优先级。通过将行为优先级切换建模为协作式马尔可夫博弈,多智能体强化学习任务监管器学习最优联合行为优先级,以减少对人类智能和高性能计算硬件的依赖。在控制层,设计了一组二阶强化学习控制器用以学习最优控制策略,实现位置和速度信号的同步跟踪。特别地,设计了一组自适应补偿器以保证输入饱和约束。数值仿真结果验证了所提出的多智能体强化学习行为控制对比有限时间、固有时间和强化学习行为控制具有更低的切换频率和控制代价。 关键词组: Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article
Reference[1]Ahmad S, Feng Z, Hu GQ, 2014. Multi-robot formation control using distributed null space behavioral approach. IEEE Int Conf on Robotics and Automation, p.3607-3612. ![]() [2]Anschel O, Baram N, Shimkin N, 2017. Averaged-DQN: variance reduction and stabilization for deep reinforcement learning. Proc 34th Int Conf on Machine Learning, p.176-185. ![]() [3]Antonelli G, Chiaverini S, 2006. Kinematic control of platoons of autonomous vehicles. IEEE Trans Robot, 22(6):1285-1292. ![]() [4]Arkin RC, 1989. Motor schema-based mobile robot navigation. Int J Robot Res, 8(4):92-112. ![]() [5]Balch T, Arkin RC, 1998. Behavior-based formation control for multirobot teams. IEEE Trans Robot Autom, 14(6):926-939. ![]() [6]Brooks RA, 1986. A robust layered control system for a mobile robot. IEEE J Robot Autom, 2(1):14-23. ![]() [7]Brooks RA, 1991. New approaches to robotics. Science, 253(5025):1227-1232. ![]() [8]Cao SJ, Sun L, Jiang JJ, et al., 2023. Reinforcement learning-based fixed-time trajectory tracking control for uncertain robotic manipulators with input saturation. IEEE Trans Neur Netw Learn Syst, 34(8):4584-4595. ![]() [9]Cao YC, Yu WW, Ren W, et al., 2013. An overview of recent progress in the study of distributed multi-agent coordination. IEEE Trans Ind Inform, 9(1):427-438. ![]() [10]Chen J, Gan MG, Huang J, et al., 2016. Formation control of multiple Euler–Lagrange systems via null-space-based behavioral control. Sci China Inform Sci, 59(1):1-11. ![]() [11]Chen YT, Zhang ZY, Huang J, 2020. Dynamic task priority planning for null-space behavioral control of multi-agent systems. IEEE Access, 8:149643-149651. ![]() [12]Dong XW, Zhou Y, Ren Z, et al., 2017. Time-varying formation tracking for second-order multi-agent systems subjected to switching topologies with application to quadrotor formation flying. IEEE Trans Ind Electron, 64(6):5014-5024. ![]() [13]Garattoni L, Birattari M, 2018. Autonomous task sequencing in a robot swarm. Sci Robot, 3(20):eaat0430. ![]() [14]Huang J, Cao M, Zhou N, et al., 2017. Distributed behavioral control for second-order nonlinear multi-agent systems. IFAC-PapersOnLine, 50(1):2445-2450. ![]() [15]Huang J, Zhou N, Cao M, 2019. Adaptive fuzzy behavioral control of second-order autonomous agents with prioritized missions: theory and experiments. IEEE Trans Ind Electron, 66(12):9612-9622. ![]() [16]Huang J, Mo ZB, Zhang ZY, et al., 2022a. Behavioral control task supervisor with memory based on reinforcement learning for human–multi-robot coordination systems. Front Inform Technol Electron Eng, 23(8):1174-1188. ![]() [17]Huang J, Wu WH, Zhang ZY, et al., 2022b. Human decision-making modeling and cooperative controller design for human–agent interaction systems. IEEE Trans Human-Mach Syst, 52(6):1122-1134. ![]() [18]Littman ML, 1994. Markov games as a framework for multi-agent reinforcement learning. Proc 11th Int Conf on Machine Learning, p.157-163. ![]() [19]Liu DR, Xue S, Zhao B, et al., 2021. Adaptive dynamic programming for control: a survey and recent advances. IEEE Trans Syst Man Cybern Syst, 51(1):142-160. ![]() [20]Liu Y, Li HY, Lu RQ, et al., 2022. An overview of finite/fixed-time control and its application in engineering systems. IEEE/CAA J Autom Sin, 9(12):2106-2120. ![]() [21]Marino A, Caccavale F, Parker LE, et al., 2009. Fuzzy behavioral control for multi-robot border patrol. Proc 17th Mediterranean Conf on Control and Automation, p.246-251. ![]() [22]Marino A, Parker LE, Antonelli G, et al., 2013. A decentralized architecture for multi-robot systems based on the null-space-behavioral control with application to multi-robot border patrolling. J Intell Robot Syst, 71(3):423-444. ![]() [23]Ott C, Dietrich A, Albu-Schäffer A, 2015. Prioritized multi-task compliance control of redundant manipulators. Automatica, 53:416-423. ![]() [24]Santos MCP, Rosales CD, Sarcinelli-Filho M, et al., 2017. A novel null-space-based UAV trajectory tracking controller with collision avoidance. IEEE/ASME Trans Mech, 22(6):2543-2553. ![]() [25]Schlanbusch R, Kristiansen R, Nicklasson PJ, 2011. Spacecraft formation reconfiguration with collision avoidance. Automatica, 47(7):1443-1449. ![]() [26]Vadakkepat P, Miin OC, Peng X, et al., 2004. Fuzzy behavior-based control of mobile robots. IEEE Trans Fuzzy Syst, 12(4):559-565. ![]() [27]Wang WJ, Li CJ, Guo YN, 2021. Relative position coordinated control for spacecraft formation flying with obstacle/collision avoidance. Nonl Dyn, 104(2):1329-1342. ![]() [28]Wang ZY, Schaul T, Hessel M, et al., 2016. Dueling network architectures for deep reinforcement learning. Proc 33rd Int Conf on Machine Learning, p.1995-2003. ![]() [29]Wei EM, Luke S, 2016. Lenient learning in independent-learner stochastic cooperative games. J Mach Learn Res, 17(1):2914-2955. ![]() [30]Wen GX, Chen CLP, Liu YJ, et al., 2017. Neural network-based adaptive leader-following consensus control for a class of nonlinear multiagent state-delay systems. IEEE Trans Cybern, 47(8):2151-2160. ![]() [31]Wen GX, Chen CLP, Feng J, et al., 2018. Optimized multi-agent formation control based on an identifier–actor–critic reinforcement learning algorithm. IEEE Trans Fuzzy Syst, 26(5):2719-2731. ![]() [32]Wen GX, Chen CLP, Ge SS, 2021. Simplified optimized backstepping control for a class of nonlinear strict-feedback systems with unknown dynamic functions. IEEE Trans Cybern, 51(9):4567-4580. ![]() [33]Yao DY, Li HY, Lu RQ, et al., 2020. Distributed sliding-mode tracking control of second-order nonlinear multi-agent systems: an event-triggered approach. IEEE Trans Cybern, 50(9):3892-3902. ![]() [34]Yao P, Wei YX, Zhao ZY, 2022. Null-space-based modulated reference trajectory generator for multi-robots formation in obstacle environment. ISA Trans, 123:168-178. ![]() [35]Zhang ZY, Mo ZB, Chen YT, et al., 2022. Reinforcement learning behavioral control for nonlinear autonomous system. IEEE/CAA J Autom Sin, 9(9):1561-1573. ![]() [36]Zheng CB, Pang ZH, Wang JX, et al., 2023. Null-space-based time-varying formation control of uncertain nonlinear second-order multiagent systems with collision avoidance. IEEE Trans Ind Electron, 70(10):10476-10485. ![]() [37]Zhou N, Xia YQ, Wang ML, et al., 2015. Finite-time attitude control of multiple rigid spacecraft using terminal sliding mode. Int J Robust Nonl Contr, 25(12):1862-1876. ![]() [38]Zhou N, Cheng XD, Sun ZQ, et al., 2022. Fixed-time cooperative behavioral control for networked autonomous agents with second-order nonlinear dynamics. IEEE Trans Cybern, 52(9):9504-9518. ![]() Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou
310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn Copyright © 2000 - 2026 Journal of Zhejiang University-SCIENCE | ||||||||||||||



ORCID:
Open peer comments: Debate/Discuss/Question/Opinion
<1>