Full Text:   <1016>

Summary:  <1062>

CLC number: TP183; TP393.1

On-line Access: 2021-05-17

Received: 2019-12-19

Revision Accepted: 2020-06-27

Crosschecked: 2020-10-20

Cited: 0

Clicked: 2311

Citations:  Bibtex RefMan EndNote GB/T7714


Wei Li


Bowei Yang


-   Go to

Article info.
Open peer comments

Frontiers of Information Technology & Electronic Engineering  2021 Vol.22 No.5 P.687-696


Dynamic value iteration networks for the planning of rapidly changing UAV swarms

Author(s):  Wei Li, Bowei Yang, Guanghua Song, Xiaohong Jiang

Affiliation(s):  School of Aeronautics and Astronautics, Zhejiang University, Hangzhou 310027, China; more

Corresponding email(s):   li2ui2@zju.edu.cn, boweiy@zju.edu.cn, ghsong@zju.edu.cn, jiangxh@zju.edu.cn

Key Words:  Dynamic value iteration networks, Episodic Q-learning, Unmanned aerial vehicle (UAV) ad-hoc network, Non-dominated sorting genetic algorithm II (NSGA-II), Path planning

Wei Li, Bowei Yang, Guanghua Song, Xiaohong Jiang. Dynamic value iteration networks for the planning of rapidly changing UAV swarms[J]. Frontiers of Information Technology & Electronic Engineering, 2021, 22(5): 687-696.

@article{title="Dynamic value iteration networks for the planning of rapidly changing UAV swarms",
author="Wei Li, Bowei Yang, Guanghua Song, Xiaohong Jiang",
journal="Frontiers of Information Technology & Electronic Engineering",
publisher="Zhejiang University Press & Springer",

%0 Journal Article
%T Dynamic value iteration networks for the planning of rapidly changing UAV swarms
%A Wei Li
%A Bowei Yang
%A Guanghua Song
%A Xiaohong Jiang
%J Frontiers of Information Technology & Electronic Engineering
%V 22
%N 5
%P 687-696
%@ 2095-9184
%D 2021
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.1900712

T1 - Dynamic value iteration networks for the planning of rapidly changing UAV swarms
A1 - Wei Li
A1 - Bowei Yang
A1 - Guanghua Song
A1 - Xiaohong Jiang
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 22
IS - 5
SP - 687
EP - 696
%@ 2095-9184
Y1 - 2021
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.1900712

In an unmanned aerial vehicle ad-hoc network (UANET), sparse and rapidly mobile unmanned aerial vehicles (UAVs)/nodes can dynamically change the UANET topology. This may lead to UANET service performance issues. In this study, for planning rapidly changing UAV swarms, we propose a dynamic value iteration network (DVIN) model trained using the episodic Q-learning method with the connection information of UANETs to generate a state value spread function, which enables UAVs/nodes to adapt to novel physical locations. We then evaluate the performance of the DVIN model and compare it with the non-dominated sorting genetic algorithm II and the exhaustive method. Simulation results demonstrate that the proposed model significantly reduces the decision-making time for UAV/node path planning with a high average success rate.





Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article


[1]Abadi M, Barham P, Chen JM, et al., 2016. TensorFlow: a system for large-scale machine learning. Proc 12th USENIX Conf on Operating Systems Design and Implementation, p.265-283.

[2]Bekmezci I, Sahingoz OK, Temel Ş, 2013. Flying ad-hoc networks (FANETs): a survey. Ad Hoc Netw, 11(3):1254-1270.

[3]Bellman R, 1966. Dynamic programming. Science, 153(3731):34-37.

[4]Bertsekas DP, 1995. Dynamic Programming and Optimal Control. Athena Scientific, Belmont, USA.

[5]Boureau YL, Bach F, LeCun Y, et al., 2010. Learning mid-level features for recognition. Proc IEEE Computer Society Conf on Computer Vision and Pattern Recognition, p.2559-2566.

[6]Buck I, Foley T, Horn D, et al., 2004. Brook for GPUs: stream computing on graphics hardware. ACM Trans Graph, 23(3):777-786.

[7]Challita U, Saad W, Bettstetter C, 2018. Deep reinforcement learning for interference-aware path planning of cellular-connected UAVs. Proc IEEE Int Conf on Communications, p.1-7.

[8]Cruz F, Wüppen P, Fazrie A, et al., 2019. Action selection methods in a robotic reinforcement learning scenario. Proc IEEE Latin American Conf on Computational Intelligence, p.1-6.

[9]Deb K, Pratap A, Agarwal S, et al., 2002. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput, 6(2):182-197.

[10]Fontes RR, 2019. Emulando Redes Sem Fio Com Mininet-WiFi. https://github.com/ramonfontes/mn-wifi-book-pt/blob/master/preview-book.pdf

[11]Fontes RR, Afzal S, Brito SHB, et al., 2015. Mininet-WiFi: emulating software-defined wireless networks. Proc 11th Int Conf on Network and Service Management, p.384-389.

[12]François-Lavet V, Henderson P, Islam R, et al., 2018. An introduction to deep reinforcement learning. Found Trends® Mach Learn, 11(3-4):219-354.

[13]Koohifar F, Kumbhar A, Guvenc I, 2017. Receding horizon multi-UAV cooperative tracking of moving RF source. IEEE Commun Lett, 21(6):1433-1436.

[14]Krizhevsky A, Sutskever I, Hinton GE, 2017. ImageNet classification with deep convolutional neural networks. Commun ACM, 60(6):84-90.

[15]Lee J, Kang BY, Kim DW, 2013. Fast genetic algorithm for robot path planning. Electron Lett, 49(23):1449-1451.

[16]Mnih V, Kavukcuoglu K, Silver D, et al., 2015. Human-level control through deep reinforcement learning. Nature, 518(7540):529-533.

[17]Mnih V, Badia AP, Mirza L, et al., 2016. Asynchronous methods for deep reinforcement learning. Proc 33rd Int Conf on Machine Learning, p.1928-1937.

[18]Niu SF, Chen SH, Guo HY, et al., 2018. Generalized value iteration networks: life beyond lattices. Proc 32nd AAAI Conf on Artificial Intelligence, p.6246-6253.

[19]Roberge V, Tarbouchi M, Labonte G, 2013. Comparison of parallel genetic algorithm and particle swarm optimization for real-time UAV path planning. IEEE Trans Ind Inform, 9(1):132-141.

[20]Schaal S, 1999. Is imitation learning the route to humanoid robots? Trends Cogn Sci, 3(6):233-242.

[21]Tamar A, Wu Y, Thomas G, et al., 2017. Value iteration networks. Proc 26th Int Joint Conf on Artificial Intelligence, p.4949-4953.

[22]Tokic M, Palm G, 2011. Value-difference based exploration: adaptive control between epsilon-greedy and softmax. Proc 34th Annual German Conf on Advances in Artificial Intelligence, p.335-346.

[23]Watkins CJCH, Dayan P, 1992. Q-learning. Mach Learn, 8(3-4):279-292.

[24]Zhang CY, Patras P, Haddadi H, 2019. Deep learning in mobile and wireless networking: a survey. IEEE Commun Surv Tutor, 21(3):2224-2287.

[25]Zhang T, Li Q, Zhang CS, et al., 2017. Current trends in the development of intelligent unmanned autonomous systems. Front Inform Technol Electron Eng, 18(1):68-85.

Open peer comments: Debate/Discuss/Question/Opinion


Please provide your name, email address and a comment

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - 2022 Journal of Zhejiang University-SCIENCE