Publishing Service

Polishing & Checking

Journal of Zhejiang University SCIENCE A

ISSN 1673-565X(Print), 1862-1775(Online), Monthly

A learning-based control pipeline for generic motor skills for quadruped robots

Abstract: Performing diverse motor skills with a universal controller has been a longstanding challenge for legged robots. While motion imitation-based reinforcement learning (RL) has shown remarkable performance in reproducing designed motor skills, the trained controller is only suitable for one specific type of motion. Motion synthesis has been well developed to generate a variety of different motions for character animation, but those motions only contain kinematic information and cannot be used for control. In this study, we introduce a control pipeline combining motion synthesis and motion imitation-based RL for generic motor skills. We design an animation state machine to synthesize motion from various sources and feed the generated kinematic reference trajectory to the RL controller as part of the input. With the proposed method, we show that a single policy is able to learn various motor skills simultaneously. Further, we notice the ability of the policy to uncover the correlations lurking behind the reference motions to improve control performance. We analyze this ability based on the predictability of the reference trajectory and use the quantified measurements to optimize the design of the controller. To demonstrate the effectiveness of our method, we deploy the trained policy on hardware and, with a single control policy, the quadruped robot can perform various learned skills, including automatic gait transitions, high kick, and forward jump.

Key words: Quadruped robot; Reinforcement learning (RL); Motion synthesis; Control

Chinese Summary  <1> 基于学习的四足机器人通用技能控制方法

作者:邵烨程1,2,金永斌1,2,黄志龙4,王宏涛1,2,3,杨卫1,2
机构:1浙江大学,交叉力学中心,中国杭州,310027;2浙江大学,杭州国际科创中心,中国杭州,311200;3浙江大学,流体动力与机电系统国家重点实验室,中国杭州,310058;4浙江大学,应用力学研究所,中国杭州,310027
目的:控制四足机器人实现连续、可控的多种运动。
创新点:1.将动作生成与基于动作模仿的强化学习方法结合,使用同一个控制器,跟踪不同运动学轨迹,在实物机器人上实现步态切换、高抬腿和跳跃等不同动作。2.提出参考轨迹可预测性的概念,强化学习控制器具备挖掘参考轨迹内在关联性的能力,揭示动作模仿中控制器输入的参考轨迹长度对控制器性能的影响机理。
方法:1.通过动作捕获、草绘与轨迹优化等方法,建立运动轨迹数据库。2.通过基于动作模仿的强化方法,在仿真环境中训练控制器模仿数据库中的动作。3.基于控制器设计动作状态机,根据用户指令实时生成可控的运动轨迹,作为控制器的输入,实现对实物机器人的控制。4.提出参考轨迹可预测性的概念,分析参考轨迹长度对控制器性能的影响。
结论:1.本文所提出的控制方法可以在实物机器人上实现对多种技能的控制。2.参考轨迹长度对控制器性能的影响是通过可预测性实现的;对于可预测性低的运动,可以通过补充参考轨迹长度提高控制器性能。

关键词组:四足机器人;强化学习;动作生成;控制


Share this article to: More

Go to Contents

References:

<Show All>

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





DOI:

10.1631/jzus.A2300128

CLC number:

Download Full Text:

Click Here

Downloaded:

588

Download summary:

<Click Here> 

Downloaded:

11

Clicked:

712

Cited:

0

On-line Access:

2024-06-27

Received:

2023-03-19

Revision Accepted:

2023-06-12

Crosschecked:

2024-06-27

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952276; Fax: +86-571-87952331; E-mail: jzus@zju.edu.cn
Copyright © 2000~ Journal of Zhejiang University-SCIENCE