Full Text:   <549>

Summary:  <76>

CLC number: TP183

On-line Access: 2024-05-06

Received: 2023-03-03

Revision Accepted: 2024-05-06

Crosschecked: 2023-08-25

Cited: 0

Clicked: 492

Citations:  Bibtex RefMan EndNote GB/T7714




Dianxi SHI


-   Go to

Article info.
Open peer comments

Frontiers of Information Technology & Electronic Engineering  2024 Vol.25 No.4 P.569-584


An anti-collision algorithm for robotic search-and-rescue tasks in unknown dynamic environments

Author(s):  Yang CHEN, Dianxi SHI, Huanhuan YANG, Tongyue LI, Zhen WANG

Affiliation(s):  School of Computer Science, Peking University, Beijing 100871, China; more

Corresponding email(s):   chenyang20@stu.pku.edu.cn, dxshi@nudt.edu.cn

Key Words:  Search and rescue, Reinforcement learning, Game theory, Collision avoidance, Decision-making

Yang CHEN, Dianxi SHI, Huanhuan YANG, Tongyue LI, Zhen WANG. An anti-collision algorithm for robotic search-and-rescue tasks in unknown dynamic environments[J]. Frontiers of Information Technology & Electronic Engineering, 2024, 25(4): 569-584.

@article{title="An anti-collision algorithm for robotic search-and-rescue tasks in unknown dynamic environments",
author="Yang CHEN, Dianxi SHI, Huanhuan YANG, Tongyue LI, Zhen WANG",
journal="Frontiers of Information Technology & Electronic Engineering",
publisher="Zhejiang University Press & Springer",

%0 Journal Article
%T An anti-collision algorithm for robotic search-and-rescue tasks in unknown dynamic environments
%A Yang CHEN
%A Dianxi SHI
%A Huanhuan YANG
%A Tongyue LI
%A Zhen WANG
%J Frontiers of Information Technology & Electronic Engineering
%V 25
%N 4
%P 569-584
%@ 2095-9184
%D 2024
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2300151

T1 - An anti-collision algorithm for robotic search-and-rescue tasks in unknown dynamic environments
A1 - Yang CHEN
A1 - Dianxi SHI
A1 - Huanhuan YANG
A1 - Tongyue LI
A1 - Zhen WANG
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 25
IS - 4
SP - 569
EP - 584
%@ 2095-9184
Y1 - 2024
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2300151

This paper deals with the search-and-rescue tasks of a mobile robot with multiple interesting targets in an unknown dynamic environment. The problem is challenging because the mobile robot needs to search for multiple targets while avoiding obstacles simultaneously. To ensure that the mobile robot avoids obstacles properly, we propose a mixed-strategy Nash equilibrium based Dyna-Q (MNDQ) algorithm. First, a multi-objective layered structure is introduced to simplify the representation of multiple objectives and reduce computational complexity. This structure divides the overall task into subtasks, including searching for targets and avoiding obstacles. Second, a risk-monitoring mechanism is proposed based on the relative positions of dynamic risks. This mechanism helps the robot avoid potential collisions and unnecessary detours. Then, to improve sampling efficiency, MNDQ is presented, which combines Dyna-Q and mixed-strategy Nash equilibrium. By using mixed-strategy Nash equilibrium, the agent makes decisions in the form of probabilities, maximizing the expected rewards and improving the overall performance of the Dyna-Q algorithm. Furthermore, a series of simulations are conducted to verify the effectiveness of the proposed method. The results show that MNDQ performs well and exhibits robustness, providing a competitive solution for future autonomous robot navigation tasks.




Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article


[1]Aggarwal S, Kumar N, 2020. Path planning techniques for unmanned aerial vehicles: a review, solutions, and challenges. Comput Commun, 149:270-299.

[2]Banerjee C, Datta D, Agarwal A, 2015. Chaotic patrol robot with frequency constraints. Proc IEEE Int Conf on Research in Computational Intelligence and Communication Networks, p.340-344.

[3]Brito B, Floor B, Ferranti L, et al., 2019. Model predictive contouring control for collision avoidance in unstructured dynamic environments. IEEE Robot Autom Lett, 4(4):4459-4466.

[4]Brito B, Everett M, How JP, et al., 2021. Where to go next: learning a subgoal recommendation policy for navigation in dynamic environments. IEEE Robot Autom Lett, 6(3):4616-4623.

[5]Brockman G, Cheung V, Pettersson L, et al., 2016. OpenAI Gym. https://arxiv.org/abs/1606.01540

[6]Chiu ZY, Richter F, Funk EK, et al., 2021. Bimanual regrasping for suture needles using reinforcement learning for rapid motion planning. Proc IEEE Int Conf on Robotics and Automation, p.7737-7743.

[7]Curiac DI, Banias O, Volosencu C, et al., 2018. Novel bioinspired approach based on chaotic dynamics for robot patrolling missions with adversaries. Entropy, 20(5):378.

[8]Dong YS, Zou XJ, 2020. Mobile robot path planning based on improved DDPG reinforcement learning algorithm. Proc IEEE 11th Int Conf on Software Engineering and Service Science, p.52-56.

[9]Faust A, Oslund K, Ramirez O, et al., 2018. PRM-RL: long-range robotic navigation tasks by combining reinforcement learning and sampling-based planning. Proc IEEE Int Conf on Robotics and Automation, p.5113-5120.

[10]Fudenberg D, Tirole J, 1991. Game Theory. MIT Press, Cambridge, USA.

[11]Gaertner M, Bjelonic M, Farshidian F, et al., 2021. Collision-free MPC for legged robots in static and dynamic scenes. Proc IEEE Int Conf on Robotics and Automation, p.8266-8272.

[12]Geng N, Meng QG, Gong DW, et al., 2019. How good are distributed allocation algorithms for solving urban search and rescue problems? A comparative study with centralized algorithms. IEEE Trans Autom Sci Eng, 16(1):478-485.

[13]Greenwald A, Hall K, 2003. Correlated-Q-learning. Proc 20th Int Conf on Machine Learning, p.242-249.

[14]Gregor M, Nemec D, Janota A, et al., 2018. A visual attention operator for playing Pac-Man. Proc ELEKTRO, p.1-6.

[15]Hayamizu Y, Amiri S, Chandan K, et al., 2021. Guiding robot exploration in reinforcement learning via automated planning. Proc 31st Int Conf on Automated Planning and Scheduling, p.625-633.

[16]Hong LB, Wang Y, Du YC, et al., 2021. UAV search-and-rescue planning using an adaptive memetic algorithm. Front Inform Technol Electron Eng, 22(11):1477-1491.

[17]Hu JL, Wellman MP, 2003. Nash Q-learning for general-sum stochastic games. J Mach Learn Res, 4:1039-1069.

[18]Hubert T, Schrittwieser J, Antonoglou I, et al., 2021. Learning and planning in complex action spaces. Proc 38th Int Conf on Machine Learning, p.4476-4486.

[19]Hwang KS, Lin JL, Huang HL, 2011. Dynamic patrol planning in a cooperative multi-robot system. Proc 14th FIRA RoboWorld Congress, p.116-123.

[20]Hwang KS, Jiang WC, Chen YJ, 2015. Model learning and knowledge sharing for a multiagent system with Dyna-Q learning. IEEE Trans Cybern, 45(5):978-990.

[21]Jaderberg M, Czarnecki WM, Dunning I, et al., 2019. Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science, 364(6443):859-865.

[22]Lei XY, Zhang Z, Dong PF, 2018. Dynamic path planning of unknown environment based on deep reinforcement learning. J Rob, 2018:5781591.

[23]Li CH, Fang C, Wang FY, et al., 2019. Complete coverage path planning for an Arnold system based mobile robot to perform specific types of missions. Front Inform Technol Electron Eng, 20(11):1530-1542.

[24]Li HQ, Huang J, Cao Z, et al., 2023. Stochastic pedestrian avoidance for autonomous vehicles using hybrid reinforcement learning. Front Inform Technol Electron Eng, 24(1):131-140.

[25]Li HR, Zhang QC, Zhao DB, 2020. Deep reinforcement learning-based automatic exploration for navigation in unknown environment. IEEE Trans Neur Netw Learn Syst, 31(6):2064-2076.

[26]Li ZR, Lu C, Yi YT, et al., 2022. A hierarchical framework for interactive behaviour prediction of heterogeneous traffic participants based on graph neural network. IEEE Trans Intell Transp Syst, 23(7):9102-9114.

[27]Liu SJ, Tong XR, 2021. Urban transportation path planning based on reinforcement learning. J Comput Appl, 41(1):185-190 (in Chinese).

[28]Liu YY, Yan SH, Zhao Y, et al., 2022. Improved Dyna-Q: a reinforcement learning method focused via heuristic graph for AGV path planning in dynamic environments. Drones, 6(11):365.

[29]Liu Z, Cao YQ, Chen JY, et al., 2023. A hierarchical reinforcement learning algorithm based on attention mechanism for UAV autonomous navigation. IEEE Trans Intell Transp Syst, 24(11):13309-13320.

[30]Lu YL, Yan K, 2020. Algorithms in multi-agent systems: a holistic perspective from reinforcement learning and game theory. https://arxiv.org/abs/2001.06487

[31]Lu YM, Kamgarpour M, 2020. Safe mission planning under dynamical uncertainties. Proc IEEE Int Conf on Robotics and Automation, p.2209-2215.

[32]Luo GY, Wang YT, Zhang H, et al., 2023. AlphaRoute: large-scale coordinated route planning via Monte Carlo tree search. Proc 37th AAAI Conf on Artificial Intelligence, p.12058-12067.

[33]Martins-Filho LS, Macau EEN, 2007. Patrol mobile robots and chaotic trajectories. Math Probl Eng, 2007:061543.

[34]McGuire KN, de Croon GCHE, Tuyls K, 2019. A comparative study of bug algorithms for robot navigation. Robot Auton Syst, 121:103261.

[35]Mirchevska B, Hügle M, Kalweit G, et al., 2021. Amortized Q-learning with model-based action proposals for autonomous driving on highways. Proc IEEE Int Conf on Robotics and Automation, p.1028-1035.

[36]Nasar W, da Silva Torres R, Gundersen OE, et al., 2023. The use of decision support in search and rescue: a systematic literature review. ISPRS Int J Geo-Inform, 12(5):182.

[37]Nash JF Jr, 1950. Equilibrium points in n-person games. Proc Natl Acad Sci USA, 36(1):48-49.

[38]Ng J, Bräunl T, 2007. Performance comparison of bug navigation algorithms. J Intell Robot Syst, 50(1):73-84.

[39]Niroui F, Sprenger B, Nejat G, 2017. Robot exploration in unknown cluttered environments when dealing with uncertainty. Proc IEEE Int Symp on Robotics and Intelligent Sensors, p.224-229.

[40]Ohnishi M, Wang L, Notomista G, et al., 2019. Barrier-certified adaptive reinforcement learning with applications to brushbot navigation. IEEE Trans Robot, 35(5):1186-1205.

[41]Osborne M, Rubinstein A, 1994. A Course in Game Theory. MIT Press, Cambridge, USA.

[42]Padakandla S, 2021. A survey of reinforcement learning algorithms for dynamically varying environments. ACM Comput Surv, 54(6):127.

[43]Patle BK, Babu LG, Pandey A, et al., 2019. A review: on path planning strategies for navigation of mobile robot. Def Technol, 15(4):582-606.

[44]Pei M, An H, Liu B, et al., 2022. An improved Dyna-Q algorithm for mobile robot path planning in unknown dynamic environment. IEEE Trans Syst Man Cybern Syst, 52(7):4415-4425.

[45]Prado J, Marques L, 2014. Energy efficient area coverage for an autonomous demining robot. Proc 1st Iberian Robotics Conf, p.459-471.

[46]Puterman ML, 1990. Markov decision processes. Handb Oper Res Manage Sci, 2:331-434.

[47]Rosenthal RW, 1973. A class of games possessing pure-strategy Nash equilibria. Int J Game Theory, 2(1):65-67.

[48]Roughgarden T, 2010. Algorithmic game theory. Commun ACM, 53(7):78-86.

[49]Roughgarden T, 2016. Twenty Lectures on Algorithmic Game Theory. Cambridge University Press, New York, USA.

[50]Shi HB, Yang SK, Hwang KS, et al., 2018. A sample aggregation approach to experiences replay of Dyna-Q learning. IEEE Access, 6:37173-37184.

[51]Sutton RS, Barto AG, 1999. Reinforcement learning. J Cognit Neurosci, 11(1):126-134.

[52]Sutton RS, Barto AG, 2018. Reinforcement Learning: an Introduction (2nd Ed.). MIT Press, Cambridge, USA.

[53]Wakayama S, Ahmed NR, 2020. Auto-tuning online POMDPs for multi-object search in uncertain environments. Proc AIAA Scitech Forum.

[54]Wang BY, Liu Z, Li QB, et al., 2020. Mobile robot path planning in dynamic environments through globally guided reinforcement learning. IEEE Robot Autom Lett, 5(4):6932-6939.

[55]Wu JF, Braverman V, Yang L, 2021. Accommodating picky customers: regret bound and exploration complexity for multi-objective reinforcement learning. Proc 35th Int Conf on Neural Information Processing Systems, p.13112-13124.

[56]Wu YX, Li XJ, Liu JJ, et al., 2019. Switch-based active deep Dyna-Q: efficient adaptive planning for task-completion dialogue policy learning. Proc 33rd AAAI Conf on Artificial intelligence, p.7289-7296.

[57]Wyrąbkiewicz K, Tarczewski T, Niewiara Ł, 2020. Local path planning for autonomous mobile robot based on APF-BUG algorithm with ground quality indicator. In: Bartoszewicz A, Kabziński J, Kacprzyk J (Eds.), Advanced, Contemporary Control. Springer, Cham, p.979-990.

[58]Yu Y, Tang J, Huang JY, et al., 2021. Multi-objective optimization for UAV-assisted wireless powered IoT networks based on extended DDPG algorithm. IEEE Trans Commun, 69(9):6361-6374.

[59]Zhang YH, Chai ZJ, Lykotrafitis G, 2021. Deep reinforcement learning with a particle dynamics environment applied to emergency evacuation of a room with obstacles. Phys A Stat Mech Appl, 571:125845.

[60]Zheng KY, Sung Y, Konidaris G, et al., 2021. Multi-resolution POMDP planning for multi-object search in 3D. Proc IEEE/RSJ Int Conf on Intelligent Robots and Systems, p.2022-2029.

[61]Zheng Z, Liu Y, Zhang XY, 2016. The more obstacle information sharing, the more effective real-time path planning? Knowl-Based Syst, 114:36-46.

[62]Zou LX, Xia L, Du P, et al., 2020. Pseudo Dyna-Q: a reinforcement learning framework for interactive recommendation. Proc 13th Int Conf on Web Search and Data Mining, p.816-824.

Open peer comments: Debate/Discuss/Question/Opinion


Please provide your name, email address and a comment

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - 2024 Journal of Zhejiang University-SCIENCE