CLC number:
On-line Access: 2024-08-27
Received: 2023-10-17
Revision Accepted: 2024-05-08
Crosschecked: 2024-06-27
Cited: 0
Clicked: 1319
Yecheng SHAO, Yongbin JIN, Zhilong HUANG, Hongtao WANG, Wei YANG. A learning-based control pipeline for generic motor skills for quadruped robots[J]. Journal of Zhejiang University Science A, 2024, 25(6): 443-454.
@article{title="A learning-based control pipeline for generic motor skills for quadruped robots",
author="Yecheng SHAO, Yongbin JIN, Zhilong HUANG, Hongtao WANG, Wei YANG",
journal="Journal of Zhejiang University Science A",
volume="25",
number="6",
pages="443-454",
year="2024",
publisher="Zhejiang University Press & Springer",
doi="10.1631/jzus.A2300128"
}
%0 Journal Article
%T A learning-based control pipeline for generic motor skills for quadruped robots
%A Yecheng SHAO
%A Yongbin JIN
%A Zhilong HUANG
%A Hongtao WANG
%A Wei YANG
%J Journal of Zhejiang University SCIENCE A
%V 25
%N 6
%P 443-454
%@ 1673-565X
%D 2024
%I Zhejiang University Press & Springer
%DOI 10.1631/jzus.A2300128
TY - JOUR
T1 - A learning-based control pipeline for generic motor skills for quadruped robots
A1 - Yecheng SHAO
A1 - Yongbin JIN
A1 - Zhilong HUANG
A1 - Hongtao WANG
A1 - Wei YANG
J0 - Journal of Zhejiang University Science A
VL - 25
IS - 6
SP - 443
EP - 454
%@ 1673-565X
Y1 - 2024
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/jzus.A2300128
Abstract: Performing diverse motor skills with a universal controller has been a longstanding challenge for legged robots. While motion imitation-based reinforcement learning (RL) has shown remarkable performance in reproducing designed motor skills, the trained controller is only suitable for one specific type of motion. motion synthesis has been well developed to generate a variety of different motions for character animation, but those motions only contain kinematic information and cannot be used for control. In this study, we introduce a control pipeline combining motion synthesis and motion imitation-based RL for generic motor skills. We design an animation state machine to synthesize motion from various sources and feed the generated kinematic reference trajectory to the RL controller as part of the input. With the proposed method, we show that a single policy is able to learn various motor skills simultaneously. Further, we notice the ability of the policy to uncover the correlations lurking behind the reference motions to improve control performance. We analyze this ability based on the predictability of the reference trajectory and use the quantified measurements to optimize the design of the controller. To demonstrate the effectiveness of our method, we deploy the trained policy on hardware and, with a single control policy, the quadruped robot can perform various learned skills, including automatic gait transitions, high kick, and forward jump.
[1]AgarwalA, KumarA, MalikJ, et al., 2022. Legged locomotion in challenging terrains using egocentric vision. Proceedings of the 6th Conference on Robot Learning, p.403-415.
[2]ClavetS, 2016. Motion matching and the road to next-gen animation. Game Developers Conference.
[3]DaoJ, DuanHL, GreenK, et al., 2021. Pushing the limits: running at 3.2 m/s on cassie. Dynamic Walking Meeting.
[4]EscontrelaA, PengXB, YuWH, et al., 2022. Adversarial motion priors make good substitutes for complex reward functions. IEEE/RSJ International Conference on Intelligent Robots and Systems, p.25-32.
[5]FuchiokaY, XieZM, van de PanneM, 2023. Opt-mimic: imitation of optimized trajectories for dynamic quadruped behaviors. International Conference on Robotics and Automation.
[6]HillA, RaffinA, ErnestusM, et al., 2018. Stable baselines. GitHub. https://github.com/hill-a/stable-baselines
[7]HoldenD, KomuraT, SaitoJ, 2017. Phase-functioned neural networks for character control. ACM Transactions on Graphics, 36(4):42.
[8]HoldenD, KanounO, PerepichkaM, et al., 2020. Learned motion matching. ACM Transactions on Graphics, 39(4):53.
[9]HuangXY, LiZY, XiangYZ, et al., 2022. Creating a dynamic quadrupedal robotic goalkeeper with reinforcement learning. arXiv:2210.04435. https://arxiv.org/abs/2210.04435
[10]HwangboJ, LeeJ, HutterM, 2018. Per-contact iteration method for solving contact dynamics. IEEE Robotics and Automation Letters, 3(2):895-902.
[11]JiG, MunJ, KimH, et al., 2022. Concurrent training of a control policy and a state estimator for dynamic and robust legged locomotion. IEEE Robotics and Automation Letters, 7(2):4630-4637.
[12]JinYB, LiuXW, ShaoYC, et al., 2022. High-speed quadrupedal locomotion by imitation-relaxation reinforcement learning. Nature Machine Intelligence, 4(12):1198-1208.
[13]KangD, ZimmermannS, CorosS, 2021. Animal gaits on quadrupedal robots using motion matching and model-based control. IEEE/RSJ International Conference on Intelligent Robots and Systems, p.8500-8507.
[14]LeeJ, HwangboJ, WellhausenL, et al., 2020. Learning quadrupedal locomotion over challenging terrain. Science Robotics, 5(47):eabc5986.
[15]LiCH, VlastelicaM, BlaesS, et al., 2022. Learning agile skills via adversarial imitation of rough partial demonstrations. Proceedings of the 6th Conference on Robot Learning, p.342-352.
[16]LingHY, ZinnoF, ChengG, et al., 2020. Character controllers using motion VAEs. ACM Transactions on Graphics, 39(4):40.
[17]MikiT, LeeJ, HwangboJ, et al., 2022. Learning robust perceptive locomotion for quadrupedal robots in the wild. Science Robotics, 7(62):eabk2822.
[18]PengXB, AbbeelP, LevineS, et al., 2018. DeepMimic: example-guided deep reinforcement learning of physics-based character skills. ACM Transactions on Graphics, 37(4):143.
[19]PengXB, ChangM, ZhangG, et al., 2019. MCP: learning composable hierarchical control with multiplicative compositional policies. Proceedings of the 33rd International Conference on Neural Information Processing Systems, article 331.
[20]PengXB, CoumansE, ZhangTN, et al., 2020. Learning agile robotic locomotion skills by imitating animals. Proceedings of the 14th Robotics: Science and Systems XVI.
[21]PengXB, MaZ, AbbeelP, et al., 2021. AMP: adversarial motion priors for stylized physics-based character control. ACM Transactions on Graphics, 40(4):144.
[22]PengXB, GuoYR, HalperL, et al., 2022. ASE: large-scale reusable adversarial skill embeddings for physically simulated characters. ACM Transactions on Graphics, 41(4):94.
[23]SchulmanJ, WolskiF, DhariwalP, et al., 2017. Proximal policy optimization algorithms. arXiv:1707.06347. https://arxiv.org/abs/1707.06347
[24]ShaoYS, JinYB, LiuXW, et al., 2022. Learning free gait transition for quadruped robots via phase-guided controller. IEEE Robotics and Automation Letters, 7(2):1230-1237.
[25]SiekmannJ, ValluriS, DaoJ, et al., 2020. Learning memory-based control for human-scale bipedal locomotion. Proceedings of the 14th Robotics: Science and Systems XVI.
[26]SiekmannJ, GreenK, WarilaJ, et al., 2021a. Blind bipedal stair traversal via sim-to-real reinforcement learning. Proceedings of the 14th Robotics: Science and Systems XVII.
[27]SiekmannJ, GodseY, FernA, et al., 2021b. Sim-to-real learning of all common bipedal gaits via periodic reward composition. IEEE International Conference on Robotics and Automation, p.7309-7315.
[28]StarkeS, ZhangH, KomuraT, et al., 2019. Neural state machine for character-scene interactions. ACM Transactions on Graphics, 38(6):209.
[29]StarkeS, MasonI, KomuraT, 2022. DeepPhase: periodic autoencoders for learning motion phase manifolds. ACM Transactions on Graphics, 41(4):136.
[30]VollenweiderE, BjelonicM, KlemmV, et al., 2022. Advanced skills through multiple adversarial motion priors in reinforcement learning. arXiv:2203.14912. https://arxiv.org/abs/2203.14912
[31]XieZM, ClaryP, DaoJ, et al., 2019. Learning locomotion skills for cassie: iterative design and sim-to-real. Proceedings of the 3rd Annual Conference on Robot Learning, p.317-329.
[32]ZhangH, StarkeS, KomuraT, et al., 2018. Mode-adaptive neural networks for quadruped motion control. ACM Transactions on Graphics, 37(4):145.
Open peer comments: Debate/Discuss/Question/Opinion
<1>