Publishing Service

Polishing & Checking

Frontiers of Information Technology & Electronic Engineering

ISSN 2095-9184 (print), ISSN 2095-9230 (online)

Pre-training with asynchronous supervised learning for reinforcement learning based autonomous driving

Abstract: Rule-based autonomous driving systems may suffer from increased complexity with large-scale inter-coupled rules, so many researchers are exploring learning-based approaches. Reinforcement learning (RL) has been applied in designing autonomous driving systems because of its outstanding performance on a wide variety of sequential control problems. However, poor initial performance is a major challenge to the practical implementation of an RL-based autonomous driving system. RL training requires extensive training data before the model achieves reasonable performance, making an RL-based model inapplicable in a real-world setting, particularly when data are expensive. We propose an asynchronous supervised learning (ASL) method for the RL-based end-to-end autonomous driving model to address the problem of poor initial performance before training this RL-based model in real-world settings. Specifically, prior knowledge is introduced in the ASL pre-training stage by asynchronously executing multiple supervised learning processes in parallel, on multiple driving demonstration data sets. After pre-training, the model is deployed on a real vehicle to be further trained by RL to adapt to the real environment and continuously break the performance limit. The presented pre-training method is evaluated on the race car simulator, TORCS (The Open Racing Car Simulator), to verify that it can be sufficiently reliable in improving the initial performance and convergence speed of an end-to-end autonomous driving model in the RL training stage. In addition, a real-vehicle verification system is built to verify the feasibility of the proposed pre-training method in a real-vehicle deployment. Simulations results show that using some demonstrations during a supervised pre-training stage allows significant improvements in initial performance and convergence speed in the RL training stage.

Key words: Self-driving, Autonomous vehicles, Reinforcement learning, Supervised learning

Chinese Summary  <23> 面向强化学习自动驾驶模型的异步监督学习预训练方法

王云鹏,郑坤贤,田大新,段续庭,周建山
北京航空航天大学交通科学与工程学院,大数据科学与脑机智能高精尖创新中心,中国北京市,100191

摘要:基于人定规则所设计的自动驾驶系统可能会因大规模相互耦合的规则而变得越来越复杂,因此许多研究人员致力于探索基于学习的解决方案。强化学习(reinforcement learning,RL)因其在各种顺序控制问题上的出色表现而被应用于自动驾驶系统设计。然而,基于RL的自动驾驶系统落地应用所面临的主要挑战是其初始性能不佳。强化学习训练需要大量训练数据,然后模型才能达到合理的性能要求,这使得基于强化学习的模型不适用于现实环境,尤其在数据昂贵的情况下。本文为基于强化学习的端到端自动驾驶模型提出一种异步监督学习(asynchronous supervised learning,ASL)方法,以解决在实际环境中训练基于强化学习模型时初始性能差的问题。具体而言,通过在多个驾驶演示数据集上并行且异步执行多个监督学习过程,在异步监督学习预训练阶段引入先验知识。经过预训练后,模型将被部署到真实车辆上进一步开展强化学习训练,以适应实际环境并不断突破性能极限。本文在赛车模拟器TORCS(The Open Racing Car Simulator)上对所提出的预训练方法进行评估,以验证该方法在改善强化学习训练阶段端到端自动驾驶模型的初始性能和收敛速度方面足够可靠。此外,建立一个实车验证系统,以验证所提预训练方法在实车部署中的可行性。仿真结果表明,在有监督的预训练阶段使用一些演示,可以显著提高强化学习训练阶段的初始性能和收敛速度。

关键词组:自主驾驶;自动驾驶车辆;强化学习;监督学习


Share this article to: More

Go to Contents

References:

<Show All>

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





DOI:

10.1631/FITEE.1900637

CLC number:

TP181; U495

Download Full Text:

Click Here

Downloaded:

4574

Download summary:

<Click Here> 

Downloaded:

1350

Clicked:

5747

Cited:

0

On-line Access:

2021-05-17

Received:

2019-11-20

Revision Accepted:

2020-12-29

Crosschecked:

2021-02-03

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952276; Fax: +86-571-87952331; E-mail: jzus@zju.edu.cn
Copyright © 2000~ Journal of Zhejiang University-SCIENCE