Publishing Service

Polishing & Checking

Frontiers of Information Technology & Electronic Engineering

ISSN 2095-9184 (print), ISSN 2095-9230 (online)

Motion planning of a quadrotor robot game using a simulation-based projected policy iteration method

Abstract: Making rational decisions for sequential decision problems in complex environments has been challenging researchers in various fields for decades. Such problems consist of state transition dynamics, stochastic uncertainties, long-term utilities, and other factors that assemble high barriers including the curse of dimensionality. Recently, the state-of-the-art algorithms in reinforcement learning studies have been developed, providing a strong potential to efficiently break the barriers and make it possible to deal with complex and practical decision problems with decent performance. We propose a formulation of a velocity varying one-on-one quadrotor robot game problem in the three-dimensional space and an approximate dynamic programming approach using a projected policy iteration method for learning the utilities of game states and improving motion policies. In addition, a simulation-based iterative scheme is employed to overcome the curse of dimensionality. Simulation results demonstrate that the proposed decision strategy can generate effective and efficient motion policies that can contend with the opponent quadrotor and gather advantaged status during the game. Flight experiments, which are conducted in the Networked Autonomous Vehicles (NAV) Lab at the Concordia University, have further validated the performance of the proposed decision strategy in the real-time environment.

Key words: Reinforcement learning, Approximate dynamic programming, Decision making, Motion planning, Unmanned aerial vehicle

Chinese Summary  <25> 四旋翼无人机在博弈中的运动规划研究:一种基于仿真的投影策略迭代方法

摘要:数十年来,如何实现在复杂环境中对序列决策问题做出有效合理的决策始终是一个困扰各领域研究者的难题。该决策问题包含状态转移动力学模型、随机因素引入的不确定性、长远决策前沿优化问题以及其他许多难题,包括维数灾难在内的诸多困难使解决这一决策问题的有效方法仍待进一步研究探索。目前,随着增强学习领域不断开发出先进算法,为尝试解决复杂环境中的序列决策问题提供了有潜力的解决方案,并可在实际应用环境中获得较高决策性能。本文提出三维空间中速度可变的一对一四旋翼无人机博弈研究平台,以及一种基于投影策略迭代方法的近似动态规划方法,以学习四旋翼博弈过程中效用函数并生成改进的四旋翼运动决策策略。此外,采用基于仿真的方法,消除维数灾难束缚。仿真结果表明,所提决策方法可在博弈对抗中高效生成有效运动策略,并在与对方四旋翼无人机对抗中能获取并保持有利态势。在肯高迪亚大学网络自动车辆(NAV)实验室进行的飞行实验进一步验证了所提决策方法在实时环境中的决策性能。

关键词组:增强学习;近似动态规划;决策;运动规划;无人机


Share this article to: More

Go to Contents

References:

<Show All>

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





DOI:

10.1631/FITEE.1800571

CLC number:

TP242

Download Full Text:

Click Here

Downloaded:

2235

Download summary:

<Click Here> 

Downloaded:

1780

Clicked:

6145

Cited:

0

On-line Access:

2019-05-14

Received:

2018-09-15

Revision Accepted:

2018-11-27

Crosschecked:

2019-04-28

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952276; Fax: +86-571-87952331; E-mail: jzus@zju.edu.cn
Copyright © 2000~ Journal of Zhejiang University-SCIENCE