CLC number: TP181
On-line Access: 2024-08-27
Received: 2023-10-17
Revision Accepted: 2024-05-08
Crosschecked: 2022-08-11
Cited: 0
Clicked: 3078
Citations: Bibtex RefMan EndNote GB/T7714
https://orcid.org/0000-0003-3330-4978
Yixiang REN, Zhenhui YE, Yining CHEN, Xiaohong JIANG, Guanghua SONG. Soft-HGRNs: soft hierarchical graph recurrent networks for multi-agent partially observable environments[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.2200073 @article{title="Soft-HGRNs: soft hierarchical graph recurrent networks for multi-agent partially observable environments", %0 Journal Article TY - JOUR
Soft-HGRNs: 用于多智能体部分可观察场景的随机性层次图递归网络1浙江大学航空航天学院,中国杭州市,310027 2浙江大学计算机科学与技术学院,中国杭州市,310027 摘要:近年来,多智能体深度强化学习(multi-agent deep reinforcement learning, MADRL)的研究进展使其在现实世界的任务中更加实用,但其相对较差的可扩展性和部分可观测的限制为MADRL模型的性能和部署带来了更多的挑战。人类社会可以被视为一个大规模的部分可观测环境,其中每个人都具备与他人交流并记忆经验的功能。基于人类社会的启发,我们提出一种新的网络结构,称为层次图递归网络(hierarchical graph recurrent network, HGRN),用于部分可观测环境下的多智能体合作任务。具体来说,我们将多智能体系统构建为一个图,利用新颖的图卷积结构来实现异构相邻智能体之间的通信,并采用一个递归单元来使智能体具备记忆历史信息的能力。为了鼓励智能体探索并提高模型的鲁棒性,我们进而设计一种最大熵学习方法,令智能体可以学习可配置目标行动熵的随机策略。基于上述技术,我们提出一种名为Soft-HGRN的基于值的MADRL算法,及其名为SAC-HGRN的actor-critic变体。在三个同构场景和一个异构环境中进行实验;实验结果不仅表明我们的方法相比四个MADRL基线取得了明显的改进,而且证明了所提模型的可解释性、可扩展性和可转移性。 关键词组: Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article
Reference[1]Adler JL, Satapathy G, Manikonda V, et al., 2005. A multi-agent approach to cooperative traffic management and route guidance. Trans Res Part B, 39(4):297-318. ![]() [2]Cho K, van Merriënboer B, Gulcehre C, et al., 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. Proc Conf on Empirical Methods in Natural Language Processing, p.1724-1734. ![]() [3]Chu TS, Wang J, Codecà L, et al., 2020. Multi-agent deep reinforcement learning for large-scale traffic signal control. IEEE Trans Intell Transp Syst, 21(3):1086-1095. ![]() [4]Claus C, Boutilier C, 1998. The dynamics of reinforcement learning in cooperative multiagent systems. Proc 15th National Conf on Artificial Intelligence and 10th Innovative Applications of Artificial Intelligence Conf, p.746-752. ![]() [5]Foerster JN, Assael YM, de Freitas N, et al., 2016. Learning to communicate with deep multi-agent reinforcement learning. Proc 30th Int Conf on Neural Information Processing Systems, p.2145-2153. ![]() [6]Haarnoja T, Tang HR, Abbeel P, et al., 2017. Reinforcement learning with deep energy-based policies. Proc 34th Int Conf on Machine Learning, p.1352-1361. ![]() [7]Haarnoja T, Zhou A, Abbeel P, et al., 2018. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. Proc 35th Int Conf on Machine Learning, p.1861-1870. ![]() [8]Hausknecht M, Stone P, 2015. Deep recurrent Q-learning for partially observable MDPs. Proc AAAI Fall Symp Series, p.29-37. ![]() [9]He KM, Zhang XY, Ren SQ, et al., 2016. Deep residual learning for image recognition. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.770-778. ![]() [10]Iqbal S, Sha F, 2019. Actor-attention-critic for multi-agent reinforcement learning. Proc 36th Int Conf on Machine Learning, p.2961-2970. ![]() [11]Jiang JC, Dun C, Huang TJ, et al., 2020. Graph convolutional reinforcement learning. Proc 8th Int Conf on Learning Representations. ![]() [12]Kingma DP, Ba J, 2015. Adam: a method for stochastic optimization. Proc 3rd Int Conf on Learning Representations. ![]() [13]Lillicrap TP, Hunt JJ, Pritzel A, et al., 2015. Continuous control with deep reinforcement learning. Proc 4th Int Conf on Learning Representations. ![]() [14]Lin LJ, 1992. Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach Learn, 8(3-4):293-321. ![]() [15]Lowe R, Wu Y, Tamar A, et al., 2017. Multi-agent actor-critic for mixed cooperative-competitive environments. Proc 31st Int Conf on Neural Information Processing Systems, p.6382-6393. ![]() [16]Mnih V, Kavukcuoglu K, Silver D, et al., 2015. Human-level control through deep reinforcement learning. Nature, 518(7540):529-533. ![]() [17]Mnih V, Badia AP, Mirza M, et al., 2016. Asynchronous methods for deep reinforcement learning. Proc 33rd Int Conf on Machine Learning, p.1928-1937. ![]() [18]Rashid T, Samvelyan M, Schroeder C, et al., 2018. QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. Proc 35th Int Conf on Machine Learning, p.4295-4304. ![]() [19]Rui P, 2010. Multi-UAV formation maneuvering control based on Q-learning fuzzy controller. Proc 2nd Int Conf on Advanced Computer Control, p.252-257. ![]() [20]Ryu H, Shin H, Park J, 2020. Multi-agent actor-critic with hierarchical graph attention network. Proc AAAI Conf Artif Intell, p.7236-7243. ![]() [21]Sukhbaatar S, Szlam A, Fergus R, 2016. Learning multi- agent communication with backpropagation. Proc 30th Int Conf on Neural Information Processing Systems, p.2252-2260. ![]() [22]Vaswani A, Shazeer N, Parmar N, et al., 2017. Attention is all you need. Proc 31st Int Conf on Neural Information Processing Systems, p.6000-6010. ![]() [23]Veličković P, Cucurull G, Casanova A, et al., 2018. Graph attention networks. Proc 6th Int Conf on Learning Representations. ![]() [24]Wang RE, Everett M, How JP, 2020. R-MADDPG for partially observable environments and limited communication. ![]() [25]Watkins CJCH, Dayan P, 1992. Q-learning. Mach Learn, 8(3-4):279-292. ![]() [26]Ye DY, Zhang MJ, Yang Y, 2015. A multi-agent framework for packet routing in wireless sensor networks. Sensors, 15(5):10026-10047. ![]() [27]Ye ZH, Chen YN, Jiang XH, et al., 2022a. Improving sample efficiency in multi-agent actor-critic methods. Appl Intell, 52(4):3691-3704. ![]() [28]Ye ZH, Wang K, Chen YN, et al., 2022b. Multi-UAV navigation for partially observable communication coverage by graph reinforcement learning. IEEE Trans Mobile Comput, early access. ![]() [29]Zhang KQ, Yang ZR, Liu H, et al., 2018. Fully decentralized multi-agent reinforcement learning with networked agents. Proc 35th Int Conf on Machine Learning, p.5872-5881. ![]() [30]Zhang Y, Mou ZY, Gao FF, et al., 2020. UAV-enabled secure communications by multi-agent deep reinforcement learning. IEEE Trans Veh Technol, 69(10):11599-11611. ![]() [31]Zheng LM, Yang JC, Cai H, et al., 2018. MAgent: a many-agent reinforcement learning platform for artificial collective intelligence. Proc 32nd AAAI Conf on Artificial Intelligence, p.8222-8223. ![]() Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou
310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn Copyright © 2000 - 2025 Journal of Zhejiang University-SCIENCE |
Open peer comments: Debate/Discuss/Question/Opinion
<1>