CLC number: TP18
On-line Access: 2022-08-22
Received: 2021-06-14
Revision Accepted: 2022-08-29
Crosschecked: 2022-02-15
Cited: 0
Clicked: 1872
Citations: Bibtex RefMan EndNote GB/T7714
Jie HUANG, Zhibin MO, Zhenyi ZHANG, Yutao CHEN. Behavioral control task supervisor with memory based on reinforcement learning for human–multi-robot coordination systems[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.2100280 @article{title="Behavioral control task supervisor with memory based on reinforcement learning for human–multi-robot coordination systems", %0 Journal Article TY - JOUR
面向人–多机器人协同系统的带记忆强化学习行为控制任务管理器1福州大学电气工程与自动化学院,中国福州市,350108 2福州大学5G+工业互联网研究院,中国福州市,350108 3福州大学工业自动化控制技术与信息处理福建省高校重点实验室,中国福州市,350108 摘要:针对人–多机器人协同系统提出一种基于行为控制框架的带记忆强化学习任务管理器(RLTS)。由于重复的人工干预,现有人–多机器人协同系统决策时间成本高、任务跟踪误差大,限制了多机器人系统的自主性。此外,基于零空间行为控制框架的任务管理器依赖手动制定优先级切换规则,难以在多机器人和多任务情况下实现最优行为优先级调整策略。提出一种带记忆强化学习任务管理器,基于零空间行为控制框架融合深度Q-网络和长短时记忆神经网络知识库,实现任务冲突时最优行为优先级调整策略以及降低人为干预频率。当机器人在紧急情况下置信度不足时,所提带记忆强化学习任务管理器会记忆人类干预历史,在遭遇相同人工干预情况时重新加载历史控制信号。仿真结果验证了该方法的有效性。最后,通过一组受外界噪声和干扰的移动机器人实验,验证了所提带记忆强化学习任务管理器在不确定现实环境中的有效性。 关键词组: Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article
Reference[1]Antonelli G, Chiaverini S, 2006. Kinematic control of platoons of autonomous vehicles. IEEE Trans Rob, 22(6):1285-1292. [2]Aviv Y, Pazgal A, 2005. A partially observed Markov decision process for dynamic pricing. Manag Sci, 51(9):1400-1416. [3]Baizid K, Giglio G, Pierri F, et al., 2015. Experiments on behavioral coordinated control of an unmanned aerial vehicle manipulator system. IEEE Int Conf on Robotics and Automation, p.4680-4685. [4]Baizid K, Giglio G, Pierri F, et al., 2017. Behavioral control of unmanned aerial vehicle manipulator systems. Auton Robot, 41(5):1203-1220. [5]Bajcsy A, Herbert SL, Fridovich-Keil D, et al., 2019. A scalable framework for real-time multi-robot, multi-human collision avoidance. Int Conf on Robotics and Automation, p.936-943. [6]Bluethmann W, Ambrose R, Diftler M, et al., 2003. Robonaut: a robot designed to work with humans in space. Auton Robot, 14(2):179-197. [7]Bogacz R, Brown E, Moehlis J, et al., 2006. The physics of optimal decision making: a formal analysis of models of performance in two-alternative forced-choice tasks. Psychol Rev, 113(4):700-765. [8]Chen YT, Zhang ZY, Huang J, 2020. Dynamic task priority planning for null-space behavioral control of multi-agent systems. IEEE Access, 8:149643-149651. [9]Fu HJ, Chen SC, Lin YL, et al., 2019. Research and validation of human-in-the-loop hybrid-augmented intelligence in Sawyer. Chin J Intell Sci Technol, 1(3):280-286 (in Chinese). [10]Gans NR, Rogers JGIII, 2021. Cooperative multirobot systems for military applications. Curr Robot Rep, 2(1):105-111. [11]Graves A, Schmidhuber J, 2005. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neur Netw, 18(5-6):602-610. [12]Honig S, Oron-Gilad T, 2018. Understanding and resolving failures in human-robot interaction: literature review and model development. Front Psychol, 9:861. [13]Huang J, Zhou N, Cao M, 2019. Adaptive fuzzy behavioral control of second-order autonomous agents with prioritized missions: theory and experiments. IEEE Trans Ind Electron, 66(12):9612-9622. [14]Huang J, Wu WH, Zhang ZY, et al., 2020. A human decision-making behavior model for human-robot interaction in multi-robot systems. IEEE Access, 8:197853-197862. [15]Lee WH, Kim JH, 2018. Hierarchical emotional episodic memory for social human robot collaboration. Auton Robot, 42(5):1087-1102. [16]Lippi M, Marino A, 2018. Safety in human-multi robot collaborative scenarios: a trajectory scaling approach. IFAC-PapersOnLine, 51(22):190-196. [17]Lippi M, Marino A, Chiaverini S, 2019. A distributed approach to human multi-robot physical interaction. IEEE Int Conf on Systems, Man and Cybernetics, p.728-734. [18]Mnih V, Kavukcuoglu K, Silver D, et al., 2015. Human-level control through deep reinforcement learning. Nature, 518(7540):529-533. [19]Mo ZB, Zhang ZY, Chen YT, et al., 2022. A reinforcement learning mission supervisor with memory for human-multi-robot coordination systems. Proc Chinese Intelligent Systems Conf, p.708-716. [20]Moreno L, Moraleda E, Salichs MA, et al., 1993. Fuzzy supervisor for behavioral control of autonomous systems. Proc 19th Annual Conf of IEEE Industrial Electronics, p.258-261. [21]Queralta JP, Taipalmaa J, Pullinen BC, et al., 2020. Collaborative multi-robot search and rescue: planning, coordination, perception, and active vision. IEEE Access, 8:191617-191643. [22]Robla-Gómez S, Becerra VM, Llata JR, et al., 2017. Working together: a review on safe human-robot collaboration in industrial environments. IEEE Access, 5:26754-26773. [23]Rosenfeld A, Agmon N, Maksimov O, et al., 2017. Intelligent agent supporting human–multi-robot team collaboration. Artif Intell, 252:211-231. [24]Wang HN, Liu N, Zhang YY, et al., 2020. Deep reinforcement learning: a survey. Front Inform Technol Electron Eng, 21(12):1726-1744. [25]Watkins CJCH, Dayan P, 1992. Q-learning. Mach Learn, 8(3-4):279-292. [26]Zhang KQ, Yang ZR, Başar T, 2021. Decentralized multi-agent reinforcement learning with networked agents: recent advances. Front Inform Technol Electron Eng, 22(6):802-814. [27]Zheng NN, Liu ZY, Ren PJ, et al., 2017. Hybrid-augmented intelligence: collaboration and cognition. Front Inform Technol Electron Eng, 18(2):153-179. [28]Zhou BT, Sun CJ, Lin L, et al., 2018. LSTM based question answering for large scale knowledge base. Acta Sci Nat Univ Pek, 54(2):286-292 (in Chinese). Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou
310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn Copyright © 2000 - 2024 Journal of Zhejiang University-SCIENCE |
Open peer comments: Debate/Discuss/Question/Opinion
<1>