Full Text:   <228>

Summary:  <2>

Suppl. Mater.: 

CLC number: TP18

On-line Access: 2024-07-05

Received: 2023-06-01

Revision Accepted: 2024-07-05

Crosschecked: 2023-11-16

Cited: 0

Clicked: 328

Citations:  Bibtex RefMan EndNote GB/T7714




Zhenyi ZHANG


-   Go to

Article info.
Open peer comments

Frontiers of Information Technology & Electronic Engineering  2024 Vol.25 No.6 P.869-886


Multi-agent reinforcement learning behavioral control for nonlinear second-order systems

Author(s):  Zhenyi ZHANG, Jie HUANG, Congjie PAN

Affiliation(s):  College of Electrical Engineering and Automation, Fuzhou University, Fuzhou 350108, China; more

Corresponding email(s):   jie.huang@fzu.edu.cn

Key Words:  Reinforcement learning, Behavioral control, Second-order systems, Mission supervisor

Zhenyi ZHANG, Jie HUANG, Congjie PAN. Multi-agent reinforcement learning behavioral control for nonlinear second-order systems[J]. Frontiers of Information Technology & Electronic Engineering, 2024, 25(6): 869-886.

@article{title="Multi-agent reinforcement learning behavioral control for nonlinear second-order systems",
author="Zhenyi ZHANG, Jie HUANG, Congjie PAN",
journal="Frontiers of Information Technology & Electronic Engineering",
publisher="Zhejiang University Press & Springer",

%0 Journal Article
%T Multi-agent reinforcement learning behavioral control for nonlinear second-order systems
%A Zhenyi ZHANG
%A Congjie PAN
%J Frontiers of Information Technology & Electronic Engineering
%V 25
%N 6
%P 869-886
%@ 2095-9184
%D 2024
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2300394

T1 - Multi-agent reinforcement learning behavioral control for nonlinear second-order systems
A1 - Zhenyi ZHANG
A1 - Jie HUANG
A1 - Congjie PAN
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 25
IS - 6
SP - 869
EP - 886
%@ 2095-9184
Y1 - 2024
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2300394

reinforcement learning behavioral control (RLBC) is limited to an individual agent without any swarm mission, because it models the behavior priority learning as a Markov decision process. In this paper, a novel multi-agent reinforcement learning behavioral control (MARLBC) method is proposed to overcome such limitations by implementing joint learning. Specifically, a multi-agent reinforcement learning mission supervisor (MARLMS) is designed for a group of nonlinear second-order systems to assign the behavior priorities at the decision layer. Through modeling behavior priority switching as a cooperative Markov game, the MARLMS learns an optimal joint behavior priority to reduce dependence on human intelligence and high-performance computing hardware. At the control layer, a group of second-order reinforcement learning controllers are designed to learn the optimal control policies to track position and velocity signals simultaneously. In particular, input saturation constraints are strictly implemented via designing a group of adaptive compensators. Numerical simulation results show that the proposed MARLBC has a lower switching frequency and control cost than finite-time and fixed-time behavioral control and RLBC methods.




Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article


[1]Ahmad S, Feng Z, Hu GQ, 2014. Multi-robot formation control using distributed null space behavioral approach. IEEE Int Conf on Robotics and Automation, p.3607-3612.

[2]Anschel O, Baram N, Shimkin N, 2017. Averaged-DQN: variance reduction and stabilization for deep reinforcement learning. Proc 34th Int Conf on Machine Learning, p.176-185.

[3]Antonelli G, Chiaverini S, 2006. Kinematic control of platoons of autonomous vehicles. IEEE Trans Robot, 22(6):1285-1292.

[4]Arkin RC, 1989. Motor schema-based mobile robot navigation. Int J Robot Res, 8(4):92-112.

[5]Balch T, Arkin RC, 1998. Behavior-based formation control for multirobot teams. IEEE Trans Robot Autom, 14(6):926-939.

[6]Brooks RA, 1986. A robust layered control system for a mobile robot. IEEE J Robot Autom, 2(1):14-23.

[7]Brooks RA, 1991. New approaches to robotics. Science, 253(5025):1227-1232.

[8]Cao SJ, Sun L, Jiang JJ, et al., 2023. Reinforcement learning-based fixed-time trajectory tracking control for uncertain robotic manipulators with input saturation. IEEE Trans Neur Netw Learn Syst, 34(8):4584-4595.

[9]Cao YC, Yu WW, Ren W, et al., 2013. An overview of recent progress in the study of distributed multi-agent coordination. IEEE Trans Ind Inform, 9(1):427-438.

[10]Chen J, Gan MG, Huang J, et al., 2016. Formation control of multiple Euler–Lagrange systems via null-space-based behavioral control. Sci China Inform Sci, 59(1):1-11.

[11]Chen YT, Zhang ZY, Huang J, 2020. Dynamic task priority planning for null-space behavioral control of multi-agent systems. IEEE Access, 8:149643-149651.

[12]Dong XW, Zhou Y, Ren Z, et al., 2017. Time-varying formation tracking for second-order multi-agent systems subjected to switching topologies with application to quadrotor formation flying. IEEE Trans Ind Electron, 64(6):5014-5024.

[13]Garattoni L, Birattari M, 2018. Autonomous task sequencing in a robot swarm. Sci Robot, 3(20):eaat0430.

[14]Huang J, Cao M, Zhou N, et al., 2017. Distributed behavioral control for second-order nonlinear multi-agent systems. IFAC-PapersOnLine, 50(1):2445-2450.

[15]Huang J, Zhou N, Cao M, 2019. Adaptive fuzzy behavioral control of second-order autonomous agents with prioritized missions: theory and experiments. IEEE Trans Ind Electron, 66(12):9612-9622.

[16]Huang J, Mo ZB, Zhang ZY, et al., 2022a. Behavioral control task supervisor with memory based on reinforcement learning for human–multi-robot coordination systems. Front Inform Technol Electron Eng, 23(8):1174-1188.

[17]Huang J, Wu WH, Zhang ZY, et al., 2022b. Human decision-making modeling and cooperative controller design for human–agent interaction systems. IEEE Trans Human-Mach Syst, 52(6):1122-1134.

[18]Littman ML, 1994. Markov games as a framework for multi-agent reinforcement learning. Proc 11th Int Conf on Machine Learning, p.157-163.

[19]Liu DR, Xue S, Zhao B, et al., 2021. Adaptive dynamic programming for control: a survey and recent advances. IEEE Trans Syst Man Cybern Syst, 51(1):142-160.

[20]Liu Y, Li HY, Lu RQ, et al., 2022. An overview of finite/fixed-time control and its application in engineering systems. IEEE/CAA J Autom Sin, 9(12):2106-2120.

[21]Marino A, Caccavale F, Parker LE, et al., 2009. Fuzzy behavioral control for multi-robot border patrol. Proc 17th Mediterranean Conf on Control and Automation, p.246-251.

[22]Marino A, Parker LE, Antonelli G, et al., 2013. A decentralized architecture for multi-robot systems based on the null-space-behavioral control with application to multi-robot border patrolling. J Intell Robot Syst, 71(3):423-444.

[23]Ott C, Dietrich A, Albu-Schäffer A, 2015. Prioritized multi-task compliance control of redundant manipulators. Automatica, 53:416-423.

[24]Santos MCP, Rosales CD, Sarcinelli-Filho M, et al., 2017. A novel null-space-based UAV trajectory tracking controller with collision avoidance. IEEE/ASME Trans Mech, 22(6):2543-2553.

[25]Schlanbusch R, Kristiansen R, Nicklasson PJ, 2011. Spacecraft formation reconfiguration with collision avoidance. Automatica, 47(7):1443-1449.

[26]Vadakkepat P, Miin OC, Peng X, et al., 2004. Fuzzy behavior-based control of mobile robots. IEEE Trans Fuzzy Syst, 12(4):559-565.

[27]Wang WJ, Li CJ, Guo YN, 2021. Relative position coordinated control for spacecraft formation flying with obstacle/collision avoidance. Nonl Dyn, 104(2):1329-1342.

[28]Wang ZY, Schaul T, Hessel M, et al., 2016. Dueling network architectures for deep reinforcement learning. Proc 33rd Int Conf on Machine Learning, p.1995-2003.

[29]Wei EM, Luke S, 2016. Lenient learning in independent-learner stochastic cooperative games. J Mach Learn Res, 17(1):2914-2955.

[30]Wen GX, Chen CLP, Liu YJ, et al., 2017. Neural network-based adaptive leader-following consensus control for a class of nonlinear multiagent state-delay systems. IEEE Trans Cybern, 47(8):2151-2160.

[31]Wen GX, Chen CLP, Feng J, et al., 2018. Optimized multi-agent formation control based on an identifier–actor–critic reinforcement learning algorithm. IEEE Trans Fuzzy Syst, 26(5):2719-2731.

[32]Wen GX, Chen CLP, Ge SS, 2021. Simplified optimized backstepping control for a class of nonlinear strict-feedback systems with unknown dynamic functions. IEEE Trans Cybern, 51(9):4567-4580.

[33]Yao DY, Li HY, Lu RQ, et al., 2020. Distributed sliding-mode tracking control of second-order nonlinear multi-agent systems: an event-triggered approach. IEEE Trans Cybern, 50(9):3892-3902.

[34]Yao P, Wei YX, Zhao ZY, 2022. Null-space-based modulated reference trajectory generator for multi-robots formation in obstacle environment. ISA Trans, 123:168-178.

[35]Zhang ZY, Mo ZB, Chen YT, et al., 2022. Reinforcement learning behavioral control for nonlinear autonomous system. IEEE/CAA J Autom Sin, 9(9):1561-1573.

[36]Zheng CB, Pang ZH, Wang JX, et al., 2023. Null-space-based time-varying formation control of uncertain nonlinear second-order multiagent systems with collision avoidance. IEEE Trans Ind Electron, 70(10):10476-10485.

[37]Zhou N, Xia YQ, Wang ML, et al., 2015. Finite-time attitude control of multiple rigid spacecraft using terminal sliding mode. Int J Robust Nonl Contr, 25(12):1862-1876.

[38]Zhou N, Cheng XD, Sun ZQ, et al., 2022. Fixed-time cooperative behavioral control for networked autonomous agents with second-order nonlinear dynamics. IEEE Trans Cybern, 52(9):9504-9518.

Open peer comments: Debate/Discuss/Question/Opinion


Please provide your name, email address and a comment

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - 2024 Journal of Zhejiang University-SCIENCE