CLC number: TP181; U495
On-line Access: 2021-05-17
Received: 2019-11-20
Revision Accepted: 2020-12-29
Crosschecked: 2021-02-03
Cited: 0
Clicked: 6106
Yunpeng Wang, Kunxian Zheng, Daxin Tian, Xuting Duan, Jianshan Zhou. Pre-training with asynchronous supervised learning for reinforcement learning based autonomous driving[J]. Frontiers of Information Technology & Electronic Engineering, 2021, 22(5): 673-686.
@article{title="Pre-training with asynchronous supervised learning for reinforcement learning based autonomous driving",
author="Yunpeng Wang, Kunxian Zheng, Daxin Tian, Xuting Duan, Jianshan Zhou",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="22",
number="5",
pages="673-686",
year="2021",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.1900637"
}
%0 Journal Article
%T Pre-training with asynchronous supervised learning for reinforcement learning based autonomous driving
%A Yunpeng Wang
%A Kunxian Zheng
%A Daxin Tian
%A Xuting Duan
%A Jianshan Zhou
%J Frontiers of Information Technology & Electronic Engineering
%V 22
%N 5
%P 673-686
%@ 2095-9184
%D 2021
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.1900637
TY - JOUR
T1 - Pre-training with asynchronous supervised learning for reinforcement learning based autonomous driving
A1 - Yunpeng Wang
A1 - Kunxian Zheng
A1 - Daxin Tian
A1 - Xuting Duan
A1 - Jianshan Zhou
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 22
IS - 5
SP - 673
EP - 686
%@ 2095-9184
Y1 - 2021
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.1900637
Abstract: Rule-based autonomous driving systems may suffer from increased complexity with large-scale inter-coupled rules, so many researchers are exploring learning-based approaches. reinforcement learning (RL) has been applied in designing autonomous driving systems because of its outstanding performance on a wide variety of sequential control problems. However, poor initial performance is a major challenge to the practical implementation of an RL-based autonomous driving system. RL training requires extensive training data before the model achieves reasonable performance, making an RL-based model inapplicable in a real-world setting, particularly when data are expensive. We propose an asynchronous supervised learning (ASL) method for the RL-based end-to-end autonomous driving model to address the problem of poor initial performance before training this RL-based model in real-world settings. Specifically, prior knowledge is introduced in the ASL pre-training stage by asynchronously executing multiple supervised learning processes in parallel, on multiple driving demonstration data sets. After pre-training, the model is deployed on a real vehicle to be further trained by RL to adapt to the real environment and continuously break the performance limit. The presented pre-training method is evaluated on the race car simulator, TORCS (The Open Racing Car Simulator), to verify that it can be sufficiently reliable in improving the initial performance and convergence speed of an end-to-end autonomous driving model in the RL training stage. In addition, a real-vehicle verification system is built to verify the feasibility of the proposed pre-training method in a real-vehicle deployment. Simulations results show that using some demonstrations during a supervised pre-training stage allows significant improvements in initial performance and convergence speed in the RL training stage.
[1]Bai ZW, Shangguan W, Cai BG, et al., 2019. Deep reinforcement learning based high-level driving behavior decision-making model in heterogeneous traffic. Proc Chinese Control Conf, p.8600-8605.
[2]Bojarski M, Del Testa D, Dworakowski D, et al., 2016. End to end learning for self-driving cars. https://arxiv.org/abs/1604.07316
[3]Brys T, Harutyunyan A, Suay HB, et al., 2015. Reinforcement learning from demonstration through shaping. Proc 24th Int Conf on Artificial Intelligence, p.3352-3358.
[4]Chen CY, Seff A, Kornhauser A, et al., 2015. DeepDriving: learning affordance for direct perception in autonomous driving. Proc IEEE Int Conf on Computer Vision, p.2722-2730.
[5]Chen JY, Yuan BD, Tomizuka M, 2019. Model-free deep reinforcement learning for urban autonomous driving. Proc IEEE Intelligent Transportation Systems Conf, p.2765-2771.
[6]Codevilla F, Müller M, López A, et al., 2018. End-to-end driving via conditional imitation learning. Proc IEEE Int Conf on Robotics and Automation, p.4693-4700.
[7]de la Cruz GVJr, Du YS, Taylor ME, 2019. Pre-training with non-expert human demonstration for deep reinforcement learning. Knowl Eng Rev, 34:e10.
[8]González D, Pérez J, Milanés V, et al., 2016. A review of motion planning techniques for automated vehicles. IEEE Trans Intell Transp Syst, 17(4):1135-1145.
[9]Hao W, Lin YJ, Cheng Y, et al., 2018. Signal progression model for long arterial: intersection grouping and coordination. IEEE Access, 6:30128-30136.
[10]He KM, Sun J, 2015. Convolutional neural networks at constrained time cost. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.5353-5360.
[11]He Y, Zhao N, Yin HX, 2018. Integrated networking, caching, and computing for connected vehicles: a deep reinforcement learning approach. IEEE Trans Veh Technol, 67(1):44-55.
[12]Li L, Lv YS, Wang FY, 2016. Traffic signal timing via deep reinforcement learning. IEEE/CAA J Autom Sin, 3(3):247-254.
[13]Li LZ, Ota K, Dong MX, 2018. Humanlike driving: empirical decision-making system for autonomous vehicles. IEEE Trans Veh Technol, 67(8):6814-6823.
[14]Liu N, Li Z, Xu JL, et al., 2017. A hierarchical framework of cloud resource allocation and power management using deep reinforcement learning. Proc IEEE 37th Int Conf on Distributed Computing Systems, p.372-382.
[15]Mao HZ, Alizadeh M, Menache I, et al., 2016. Resource management with deep reinforcement learning. Proc 15th ACM Workshop on Hot Topics in Networks, p.50-56.
[16]Mnih V, Kavukcuoglu K, Silver D, et al., 2013. Playing Atari with deep reinforcement learning. https://arxiv.org/abs/1312.5602
[17]Mnih V, Kavukcuoglu K, Silver D, et al., 2015. Human-level control through deep reinforcement learning. Nature, 518(7540):529-533.
[18]Mnih V, Badia AP, Mirza M, et al., 2016. Asynchronous methods for deep reinforcement learning. Proc 33rd Int Conf on Machine Learning, p.1928-1937.
[19]Nair A, Srinivasan P, Blackwell S, et al., 2015. Massively parallel methods for deep reinforcement learning. https://arxiv.org/abs/1507.04296
[20]Nair A, McGrew B, Andrychowicz M, et al., 2018. Overcoming exploration in reinforcement learning with demonstrations. https://arxiv.org/abs/1709.10089
[21]Paden B, Čáp M, Yong SZ, et al., 2016. A survey of motion planning and control techniques for self-driving urban vehicles. IEEE Trans Intell Veh, 1(1):33-55.
[22]Qiu CR, Hu Y, Chen Y, et al., 2019. Deep deterministic policy gradient (DDPG)-based energy harvesting wireless communications. IEEE Int Things J, 6(5):8577-8588.
[23]Sallab AE, Abdou M, Perot E, et al., 2017. Deep reinforcement learning framework for autonomous driving. Electron Imag, 2017(19):70-76.
[24]Schwarting W, Alonso-Mora J, Rus D, 2018. Planning and decision-making for autonomous vehicles. Ann Rev Contr Robot Auton Syst, 1:187-210.
[25]Selvaraju RR, Cogswell M, Das A, et al., 2019. Grad-CAM: visual explanations from deep networks via gradient-based localization. Int J Comput Vis, 128(8):336-359.
[26]Silver D, Schrittwieser J, Simonyan K, et al., 2017. Mastering the game of Go without human knowledge. Nature, 550(7676):354-359.
[27]Taylor ME, Stone P, 2009. Transfer learning for reinforcement learning domains: a survey. J Mach Learn Res, 10:1633-1685.
[28]Wang YP, Zheng KX, Tian DX, et al., 2020. Cooperative channel assignment for VANETs based on multiagent reinforcement learning. Front Inform Technol Electron Eng, 21(7):1047-1058.
[29]Xu ZY, Wang YZ, Tang J, et al., 2017. A deep reinforcement learning based framework for power-efficient resource allocation in cloud RANs. Proc IEEE Int Conf on Communications, p.1-6.
[30]Zhang XQ, Ma HM, 2018. Pretraining deep actor-critic reinforcement learning algorithms with expert demonstrations. https://arxiv.org/abs/1801.10459
[31]Zhou BL, Khosla A, Lapedriza A, et al., 2016. Learning deep features for discriminative localization. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.2921-2929.
Open peer comments: Debate/Discuss/Question/Opinion
<1>