CLC number: TP183
On-line Access: 2024-08-27
Received: 2023-10-17
Revision Accepted: 2024-05-08
Crosschecked: 2023-08-25
Cited: 0
Clicked: 1983
Citations: Bibtex RefMan EndNote GB/T7714
Yang CHEN, Dianxi SHI, Huanhuan YANG, Tongyue LI, Zhen WANG. An anti-collision algorithm for robotic search-and-rescue tasks in unknown dynamic environments[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.2300151 @article{title="An anti-collision algorithm for robotic search-and-rescue tasks in unknown dynamic environments", %0 Journal Article TY - JOUR
面向未知动态环境的机器人搜救任务避障算法1北京大学计算机学院,中国北京市,100871 2天津(滨海)人工智能创新中心,中国天津市,300457 3智能博弈与决策实验室,中国北京市,100071 4国防科技大学计算机学院,中国长沙市,410073 摘要:本文研究未知动态环境下具有多个兴趣目标的移动机器人搜救任务问题。由于移动机器人需要搜救多个目标并避开障碍,此类问题具有挑战性。为确保移动机器人合理避碰,本文提出一种基于混合策略纳什均衡的Dyna-Q算法(MNDQ)。首先,引入一种多目标分层结构以简化问题,该结构将整个任务划分为多个子任务,包括搜索目标和躲避障碍。其次,提出基于动态风险相对位置的风险监测机制,使机器人避免潜在碰撞和绕路。此外,为提高采样效率,提出了结合Dyna-Q和混合策略纳什均衡的强化学习方法(MNDQ)。根据混合策略纳什均衡,智能体以概率的形式做出决策从而最大化期望回报,提高Dyna-Q算法的整体性能。最后,通过仿真实验验证所提方法的有效性。结果表明,该方法具有良好的表现并为未来的机器人自主导航任务提供了解决思路。 关键词组: Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article
Reference[1]Aggarwal S, Kumar N, 2020. Path planning techniques for unmanned aerial vehicles: a review, solutions, and challenges. Comput Commun, 149:270-299. ![]() [2]Banerjee C, Datta D, Agarwal A, 2015. Chaotic patrol robot with frequency constraints. Proc IEEE Int Conf on Research in Computational Intelligence and Communication Networks, p.340-344. ![]() [3]Brito B, Floor B, Ferranti L, et al., 2019. Model predictive contouring control for collision avoidance in unstructured dynamic environments. IEEE Robot Autom Lett, 4(4):4459-4466. ![]() [4]Brito B, Everett M, How JP, et al., 2021. Where to go next: learning a subgoal recommendation policy for navigation in dynamic environments. IEEE Robot Autom Lett, 6(3):4616-4623. ![]() [5]Brockman G, Cheung V, Pettersson L, et al., 2016. OpenAI Gym. https://arxiv.org/abs/1606.01540 ![]() [6]Chiu ZY, Richter F, Funk EK, et al., 2021. Bimanual regrasping for suture needles using reinforcement learning for rapid motion planning. Proc IEEE Int Conf on Robotics and Automation, p.7737-7743. ![]() [7]Curiac DI, Banias O, Volosencu C, et al., 2018. Novel bioinspired approach based on chaotic dynamics for robot patrolling missions with adversaries. Entropy, 20(5):378. ![]() [8]Dong YS, Zou XJ, 2020. Mobile robot path planning based on improved DDPG reinforcement learning algorithm. Proc IEEE 11th Int Conf on Software Engineering and Service Science, p.52-56. ![]() [9]Faust A, Oslund K, Ramirez O, et al., 2018. PRM-RL: long-range robotic navigation tasks by combining reinforcement learning and sampling-based planning. Proc IEEE Int Conf on Robotics and Automation, p.5113-5120. ![]() [10]Fudenberg D, Tirole J, 1991. Game Theory. MIT Press, Cambridge, USA. ![]() [11]Gaertner M, Bjelonic M, Farshidian F, et al., 2021. Collision-free MPC for legged robots in static and dynamic scenes. Proc IEEE Int Conf on Robotics and Automation, p.8266-8272. ![]() [12]Geng N, Meng QG, Gong DW, et al., 2019. How good are distributed allocation algorithms for solving urban search and rescue problems? A comparative study with centralized algorithms. IEEE Trans Autom Sci Eng, 16(1):478-485. ![]() [13]Greenwald A, Hall K, 2003. Correlated-Q-learning. Proc 20th Int Conf on Machine Learning, p.242-249. ![]() [14]Gregor M, Nemec D, Janota A, et al., 2018. A visual attention operator for playing Pac-Man. Proc ELEKTRO, p.1-6. ![]() [15]Hayamizu Y, Amiri S, Chandan K, et al., 2021. Guiding robot exploration in reinforcement learning via automated planning. Proc 31st Int Conf on Automated Planning and Scheduling, p.625-633. ![]() [16]Hong LB, Wang Y, Du YC, et al., 2021. UAV search-and-rescue planning using an adaptive memetic algorithm. Front Inform Technol Electron Eng, 22(11):1477-1491. ![]() [17]Hu JL, Wellman MP, 2003. Nash Q-learning for general-sum stochastic games. J Mach Learn Res, 4:1039-1069. ![]() [18]Hubert T, Schrittwieser J, Antonoglou I, et al., 2021. Learning and planning in complex action spaces. Proc 38th Int Conf on Machine Learning, p.4476-4486. ![]() [19]Hwang KS, Lin JL, Huang HL, 2011. Dynamic patrol planning in a cooperative multi-robot system. Proc 14th FIRA RoboWorld Congress, p.116-123. ![]() [20]Hwang KS, Jiang WC, Chen YJ, 2015. Model learning and knowledge sharing for a multiagent system with Dyna-Q learning. IEEE Trans Cybern, 45(5):978-990. ![]() [21]Jaderberg M, Czarnecki WM, Dunning I, et al., 2019. Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science, 364(6443):859-865. ![]() [22]Lei XY, Zhang Z, Dong PF, 2018. Dynamic path planning of unknown environment based on deep reinforcement learning. J Rob, 2018:5781591. ![]() [23]Li CH, Fang C, Wang FY, et al., 2019. Complete coverage path planning for an Arnold system based mobile robot to perform specific types of missions. Front Inform Technol Electron Eng, 20(11):1530-1542. ![]() [24]Li HQ, Huang J, Cao Z, et al., 2023. Stochastic pedestrian avoidance for autonomous vehicles using hybrid reinforcement learning. Front Inform Technol Electron Eng, 24(1):131-140. ![]() [25]Li HR, Zhang QC, Zhao DB, 2020. Deep reinforcement learning-based automatic exploration for navigation in unknown environment. IEEE Trans Neur Netw Learn Syst, 31(6):2064-2076. ![]() [26]Li ZR, Lu C, Yi YT, et al., 2022. A hierarchical framework for interactive behaviour prediction of heterogeneous traffic participants based on graph neural network. IEEE Trans Intell Transp Syst, 23(7):9102-9114. ![]() [27]Liu SJ, Tong XR, 2021. Urban transportation path planning based on reinforcement learning. J Comput Appl, 41(1):185-190 (in Chinese). ![]() [28]Liu YY, Yan SH, Zhao Y, et al., 2022. Improved Dyna-Q: a reinforcement learning method focused via heuristic graph for AGV path planning in dynamic environments. Drones, 6(11):365. ![]() [29]Liu Z, Cao YQ, Chen JY, et al., 2023. A hierarchical reinforcement learning algorithm based on attention mechanism for UAV autonomous navigation. IEEE Trans Intell Transp Syst, 24(11):13309-13320. ![]() [30]Lu YL, Yan K, 2020. Algorithms in multi-agent systems: a holistic perspective from reinforcement learning and game theory. https://arxiv.org/abs/2001.06487 ![]() [31]Lu YM, Kamgarpour M, 2020. Safe mission planning under dynamical uncertainties. Proc IEEE Int Conf on Robotics and Automation, p.2209-2215. ![]() [32]Luo GY, Wang YT, Zhang H, et al., 2023. AlphaRoute: large-scale coordinated route planning via Monte Carlo tree search. Proc 37th AAAI Conf on Artificial Intelligence, p.12058-12067. ![]() [33]Martins-Filho LS, Macau EEN, 2007. Patrol mobile robots and chaotic trajectories. Math Probl Eng, 2007:061543. ![]() [34]McGuire KN, de Croon GCHE, Tuyls K, 2019. A comparative study of bug algorithms for robot navigation. Robot Auton Syst, 121:103261. ![]() [35]Mirchevska B, Hügle M, Kalweit G, et al., 2021. Amortized Q-learning with model-based action proposals for autonomous driving on highways. Proc IEEE Int Conf on Robotics and Automation, p.1028-1035. ![]() [36]Nasar W, da Silva Torres R, Gundersen OE, et al., 2023. The use of decision support in search and rescue: a systematic literature review. ISPRS Int J Geo-Inform, 12(5):182. ![]() [37]Nash JF Jr, 1950. Equilibrium points in n-person games. Proc Natl Acad Sci USA, 36(1):48-49. ![]() [38]Ng J, Bräunl T, 2007. Performance comparison of bug navigation algorithms. J Intell Robot Syst, 50(1):73-84. ![]() [39]Niroui F, Sprenger B, Nejat G, 2017. Robot exploration in unknown cluttered environments when dealing with uncertainty. Proc IEEE Int Symp on Robotics and Intelligent Sensors, p.224-229. ![]() [40]Ohnishi M, Wang L, Notomista G, et al., 2019. Barrier-certified adaptive reinforcement learning with applications to brushbot navigation. IEEE Trans Robot, 35(5):1186-1205. ![]() [41]Osborne M, Rubinstein A, 1994. A Course in Game Theory. MIT Press, Cambridge, USA. ![]() [42]Padakandla S, 2021. A survey of reinforcement learning algorithms for dynamically varying environments. ACM Comput Surv, 54(6):127. ![]() [43]Patle BK, Babu LG, Pandey A, et al., 2019. A review: on path planning strategies for navigation of mobile robot. Def Technol, 15(4):582-606. ![]() [44]Pei M, An H, Liu B, et al., 2022. An improved Dyna-Q algorithm for mobile robot path planning in unknown dynamic environment. IEEE Trans Syst Man Cybern Syst, 52(7):4415-4425. ![]() [45]Prado J, Marques L, 2014. Energy efficient area coverage for an autonomous demining robot. Proc 1st Iberian Robotics Conf, p.459-471. ![]() [46]Puterman ML, 1990. Markov decision processes. Handb Oper Res Manage Sci, 2:331-434. ![]() [47]Rosenthal RW, 1973. A class of games possessing pure-strategy Nash equilibria. Int J Game Theory, 2(1):65-67. ![]() [48]Roughgarden T, 2010. Algorithmic game theory. Commun ACM, 53(7):78-86. ![]() [49]Roughgarden T, 2016. Twenty Lectures on Algorithmic Game Theory. Cambridge University Press, New York, USA. ![]() [50]Shi HB, Yang SK, Hwang KS, et al., 2018. A sample aggregation approach to experiences replay of Dyna-Q learning. IEEE Access, 6:37173-37184. ![]() [51]Sutton RS, Barto AG, 1999. Reinforcement learning. J Cognit Neurosci, 11(1):126-134. ![]() [52]Sutton RS, Barto AG, 2018. Reinforcement Learning: an Introduction (2nd Ed.). MIT Press, Cambridge, USA. ![]() [53]Wakayama S, Ahmed NR, 2020. Auto-tuning online POMDPs for multi-object search in uncertain environments. Proc AIAA Scitech Forum. ![]() [54]Wang BY, Liu Z, Li QB, et al., 2020. Mobile robot path planning in dynamic environments through globally guided reinforcement learning. IEEE Robot Autom Lett, 5(4):6932-6939. ![]() [55]Wu JF, Braverman V, Yang L, 2021. Accommodating picky customers: regret bound and exploration complexity for multi-objective reinforcement learning. Proc 35th Int Conf on Neural Information Processing Systems, p.13112-13124. ![]() [56]Wu YX, Li XJ, Liu JJ, et al., 2019. Switch-based active deep Dyna-Q: efficient adaptive planning for task-completion dialogue policy learning. Proc 33rd AAAI Conf on Artificial intelligence, p.7289-7296. ![]() [57]Wyrąbkiewicz K, Tarczewski T, Niewiara Ł, 2020. Local path planning for autonomous mobile robot based on APF-BUG algorithm with ground quality indicator. In: Bartoszewicz A, Kabziński J, Kacprzyk J (Eds.), Advanced, Contemporary Control. Springer, Cham, p.979-990. ![]() [58]Yu Y, Tang J, Huang JY, et al., 2021. Multi-objective optimization for UAV-assisted wireless powered IoT networks based on extended DDPG algorithm. IEEE Trans Commun, 69(9):6361-6374. ![]() [59]Zhang YH, Chai ZJ, Lykotrafitis G, 2021. Deep reinforcement learning with a particle dynamics environment applied to emergency evacuation of a room with obstacles. Phys A Stat Mech Appl, 571:125845. ![]() [60]Zheng KY, Sung Y, Konidaris G, et al., 2021. Multi-resolution POMDP planning for multi-object search in 3D. Proc IEEE/RSJ Int Conf on Intelligent Robots and Systems, p.2022-2029. ![]() [61]Zheng Z, Liu Y, Zhang XY, 2016. The more obstacle information sharing, the more effective real-time path planning? Knowl-Based Syst, 114:36-46. ![]() [62]Zou LX, Xia L, Du P, et al., 2020. Pseudo Dyna-Q: a reinforcement learning framework for interactive recommendation. Proc 13th Int Conf on Web Search and Data Mining, p.816-824. ![]() Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou
310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn Copyright © 2000 - 2025 Journal of Zhejiang University-SCIENCE |
Open peer comments: Debate/Discuss/Question/Opinion
<1>