|
Frontiers of Information Technology & Electronic Engineering
ISSN 2095-9184 (print), ISSN 2095-9230 (online)
2020 Vol.21 No.5 P.777-795
Proximal policy optimization with an integral compensator for quadrotor control
Abstract: We use the advanced proximal policy optimization (PPO) reinforcement learning algorithm to optimize the stochastic control strategy to achieve speed control of the “model-free” quadrotor. The model is controlled by four learned neural networks, which directly map the system states to control commands in an end-to-end style. By introducing an integral compensator into the actor-critic framework, the speed tracking accuracy and robustness have been greatly enhanced. In addition, a two-phase learning scheme which includes both offline- and online-learning is developed for practical use. A model with strong generalization ability is learned in the offline phase. Then, the flight policy of the model is continuously optimized in the online learning phase. Finally, the performances of our proposed algorithm are compared with those of the traditional PID algorithm.
Key words: Reinforcement learning, Proximal policy optimization, Quadrotor control, Neural network
东南大学自动化学院,中国南京市,210096
摘要:使用先进的近端策略优化强化学习算法优化随机控制策略,实现对无模型四旋翼飞行器速度的稳定控制。飞行器模型由4个可以学习训练的子神经网络控制,神经网络以一种端到端的方式将模型状态映射为控制命令输送给飞行器执行。将积分补偿器引入行为评估算法框架,可大大提高模型速度跟踪的准确性和鲁棒性。此外,开发了包括离线学习和在线学习的两阶段学习方案,以供实际飞行之需。在在线学习阶段,不断优化模型的飞行策略。最后,对比提出的算法与传统PID算法的实验效果。
关键词组:
References:
Open peer comments: Debate/Discuss/Question/Opinion
<1>
DOI:
10.1631/FITEE.1900641
CLC number:
TP183; TP273
Download Full Text:
Downloaded:
7273
Download summary:
<Click Here>Downloaded:
1580Clicked:
5972
Cited:
0
On-line Access:
2024-08-27
Received:
2023-10-17
Revision Accepted:
2024-05-08
Crosschecked:
2020-04-27