CLC number: TP18
On-line Access: 2024-08-27
Received: 2023-10-17
Revision Accepted: 2024-05-08
Crosschecked: 2022-02-15
Cited: 0
Clicked: 3090
Citations: Bibtex RefMan EndNote GB/T7714
Jie HUANG, Zhibin MO, Zhenyi ZHANG, Yutao CHEN. Behavioral control task supervisor with memory based on reinforcement learning for human–multi-robot coordination systems[J]. Frontiers of Information Technology & Electronic Engineering, 2022, 23(8): 1174-1188.
@article{title="Behavioral control task supervisor with memory based on reinforcement learning for human–multi-robot coordination systems",
author="Jie HUANG, Zhibin MO, Zhenyi ZHANG, Yutao CHEN",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="23",
number="8",
pages="1174-1188",
year="2022",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2100280"
}
%0 Journal Article
%T Behavioral control task supervisor with memory based on reinforcement learning for human–multi-robot coordination systems
%A Jie HUANG
%A Zhibin MO
%A Zhenyi ZHANG
%A Yutao CHEN
%J Frontiers of Information Technology & Electronic Engineering
%V 23
%N 8
%P 1174-1188
%@ 2095-9184
%D 2022
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2100280
TY - JOUR
T1 - Behavioral control task supervisor with memory based on reinforcement learning for human–multi-robot coordination systems
A1 - Jie HUANG
A1 - Zhibin MO
A1 - Zhenyi ZHANG
A1 - Yutao CHEN
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 23
IS - 8
SP - 1174
EP - 1188
%@ 2095-9184
Y1 - 2022
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2100280
Abstract: In this study, a novel reinforcement learning task supervisor (RLTS) with memory in a behavioral control framework is proposed for human–;multi-robot coordination systems (HMRCSs). Existing HMRCSs suffer from high decision-making time cost and large task tracking errors caused by repeated human intervention, which restricts the autonomy of multi-robot systems (MRSs). Moreover, existing task supervisors in the null-space-based behavioral control (NSBC) framework need to formulate many priority-switching rules manually, which makes it difficult to realize an optimal behavioral priority adjustment strategy in the case of multiple robots and multiple tasks. The proposed RLTS with memory provides a detailed integration of the deep Q-network (DQN) and long short-term memory (LSTM) knowledge base within the NSBC framework, to achieve an optimal behavioral priority adjustment strategy in the presence of task conflict and to reduce the frequency of human intervention. Specifically, the proposed RLTS with memory begins by memorizing human intervention history when the robot systems are not confident in emergencies, and then reloads the history information when encountering the same situation that has been tackled by humans previously. Simulation results demonstrate the effectiveness of the proposed RLTS. Finally, an experiment using a group of mobile robots subject to external noise and disturbances validates the effectiveness of the proposed RLTS with memory in uncertain real-world environments.
[1]Antonelli G, Chiaverini S, 2006. Kinematic control of platoons of autonomous vehicles. IEEE Trans Rob, 22(6):1285-1292.
[2]Aviv Y, Pazgal A, 2005. A partially observed Markov decision process for dynamic pricing. Manag Sci, 51(9):1400-1416.
[3]Baizid K, Giglio G, Pierri F, et al., 2015. Experiments on behavioral coordinated control of an unmanned aerial vehicle manipulator system. IEEE Int Conf on Robotics and Automation, p.4680-4685.
[4]Baizid K, Giglio G, Pierri F, et al., 2017. Behavioral control of unmanned aerial vehicle manipulator systems. Auton Robot, 41(5):1203-1220.
[5]Bajcsy A, Herbert SL, Fridovich-Keil D, et al., 2019. A scalable framework for real-time multi-robot, multi-human collision avoidance. Int Conf on Robotics and Automation, p.936-943.
[6]Bluethmann W, Ambrose R, Diftler M, et al., 2003. Robonaut: a robot designed to work with humans in space. Auton Robot, 14(2):179-197.
[7]Bogacz R, Brown E, Moehlis J, et al., 2006. The physics of optimal decision making: a formal analysis of models of performance in two-alternative forced-choice tasks. Psychol Rev, 113(4):700-765.
[8]Chen YT, Zhang ZY, Huang J, 2020. Dynamic task priority planning for null-space behavioral control of multi-agent systems. IEEE Access, 8:149643-149651.
[9]Fu HJ, Chen SC, Lin YL, et al., 2019. Research and validation of human-in-the-loop hybrid-augmented intelligence in Sawyer. Chin J Intell Sci Technol, 1(3):280-286 (in Chinese).
[10]Gans NR, Rogers JGIII, 2021. Cooperative multirobot systems for military applications. Curr Robot Rep, 2(1):105-111.
[11]Graves A, Schmidhuber J, 2005. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neur Netw, 18(5-6):602-610.
[12]Honig S, Oron-Gilad T, 2018. Understanding and resolving failures in human-robot interaction: literature review and model development. Front Psychol, 9:861.
[13]Huang J, Zhou N, Cao M, 2019. Adaptive fuzzy behavioral control of second-order autonomous agents with prioritized missions: theory and experiments. IEEE Trans Ind Electron, 66(12):9612-9622.
[14]Huang J, Wu WH, Zhang ZY, et al., 2020. A human decision-making behavior model for human-robot interaction in multi-robot systems. IEEE Access, 8:197853-197862.
[15]Lee WH, Kim JH, 2018. Hierarchical emotional episodic memory for social human robot collaboration. Auton Robot, 42(5):1087-1102.
[16]Lippi M, Marino A, 2018. Safety in human-multi robot collaborative scenarios: a trajectory scaling approach. IFAC-PapersOnLine, 51(22):190-196.
[17]Lippi M, Marino A, Chiaverini S, 2019. A distributed approach to human multi-robot physical interaction. IEEE Int Conf on Systems, Man and Cybernetics, p.728-734.
[18]Mnih V, Kavukcuoglu K, Silver D, et al., 2015. Human-level control through deep reinforcement learning. Nature, 518(7540):529-533.
[19]Mo ZB, Zhang ZY, Chen YT, et al., 2022. A reinforcement learning mission supervisor with memory for human-multi-robot coordination systems. Proc Chinese Intelligent Systems Conf, p.708-716.
[20]Moreno L, Moraleda E, Salichs MA, et al., 1993. Fuzzy supervisor for behavioral control of autonomous systems. Proc 19th Annual Conf of IEEE Industrial Electronics, p.258-261.
[21]Queralta JP, Taipalmaa J, Pullinen BC, et al., 2020. Collaborative multi-robot search and rescue: planning, coordination, perception, and active vision. IEEE Access, 8:191617-191643.
[22]Robla-Gómez S, Becerra VM, Llata JR, et al., 2017. Working together: a review on safe human-robot collaboration in industrial environments. IEEE Access, 5:26754-26773.
[23]Rosenfeld A, Agmon N, Maksimov O, et al., 2017. Intelligent agent supporting human–multi-robot team collaboration. Artif Intell, 252:211-231.
[24]Wang HN, Liu N, Zhang YY, et al., 2020. Deep reinforcement learning: a survey. Front Inform Technol Electron Eng, 21(12):1726-1744.
[25]Watkins CJCH, Dayan P, 1992. Q-learning. Mach Learn, 8(3-4):279-292.
[26]Zhang KQ, Yang ZR, Başar T, 2021. Decentralized multi-agent reinforcement learning with networked agents: recent advances. Front Inform Technol Electron Eng, 22(6):802-814.
[27]Zheng NN, Liu ZY, Ren PJ, et al., 2017. Hybrid-augmented intelligence: collaboration and cognition. Front Inform Technol Electron Eng, 18(2):153-179.
[28]Zhou BT, Sun CJ, Lin L, et al., 2018. LSTM based question answering for large scale knowledge base. Acta Sci Nat Univ Pek, 54(2):286-292 (in Chinese).
Open peer comments: Debate/Discuss/Question/Opinion
<1>