CLC number:
On-line Access: 2024-08-27
Received: 2023-10-17
Revision Accepted: 2024-05-08
Crosschecked: 2024-06-27
Cited: 0
Clicked: 1544
Yecheng SHAO, Yongbin JIN, Zhilong HUANG, Hongtao WANG, Wei YANG. A learning-based control pipeline for generic motor skills for quadruped robots[J]. Journal of Zhejiang University Science A,in press.Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/jzus.A2300128 @article{title="A learning-based control pipeline for generic motor skills for quadruped robots", %0 Journal Article TY - JOUR
基于学习的四足机器人通用技能控制方法机构:1浙江大学,交叉力学中心,中国杭州,310027;2浙江大学,杭州国际科创中心,中国杭州,311200;3浙江大学,流体动力与机电系统国家重点实验室,中国杭州,310058;4浙江大学,应用力学研究所,中国杭州,310027 目的:控制四足机器人实现连续、可控的多种运动。 创新点:1.将动作生成与基于动作模仿的强化学习方法结合,使用同一个控制器,跟踪不同运动学轨迹,在实物机器人上实现步态切换、高抬腿和跳跃等不同动作。2.提出参考轨迹可预测性的概念,强化学习控制器具备挖掘参考轨迹内在关联性的能力,揭示动作模仿中控制器输入的参考轨迹长度对控制器性能的影响机理。 方法:1.通过动作捕获、草绘与轨迹优化等方法,建立运动轨迹数据库。2.通过基于动作模仿的强化方法,在仿真环境中训练控制器模仿数据库中的动作。3.基于控制器设计动作状态机,根据用户指令实时生成可控的运动轨迹,作为控制器的输入,实现对实物机器人的控制。4.提出参考轨迹可预测性的概念,分析参考轨迹长度对控制器性能的影响。 结论:1.本文所提出的控制方法可以在实物机器人上实现对多种技能的控制。2.参考轨迹长度对控制器性能的影响是通过可预测性实现的;对于可预测性低的运动,可以通过补充参考轨迹长度提高控制器性能。 关键词组: Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article
Reference[1]AgarwalA, KumarA, MalikJ, et al., 2022. Legged locomotion in challenging terrains using egocentric vision. Proceedings of the 6th Conference on Robot Learning, p.403-415. ![]() [2]ClavetS, 2016. Motion matching and the road to next-gen animation. Game Developers Conference. ![]() [3]DaoJ, DuanHL, GreenK, et al., 2021. Pushing the limits: running at 3.2 m/s on cassie. Dynamic Walking Meeting. ![]() [4]EscontrelaA, PengXB, YuWH, et al., 2022. Adversarial motion priors make good substitutes for complex reward functions. IEEE/RSJ International Conference on Intelligent Robots and Systems, p.25-32. ![]() [5]FuchiokaY, XieZM, van de PanneM, 2023. Opt-mimic: imitation of optimized trajectories for dynamic quadruped behaviors. International Conference on Robotics and Automation. ![]() [6]HillA, RaffinA, ErnestusM, et al., 2018. Stable baselines. GitHub. https://github.com/hill-a/stable-baselines ![]() [7]HoldenD, KomuraT, SaitoJ, 2017. Phase-functioned neural networks for character control. ACM Transactions on Graphics, 36(4):42. ![]() [8]HoldenD, KanounO, PerepichkaM, et al., 2020. Learned motion matching. ACM Transactions on Graphics, 39(4):53. ![]() [9]HuangXY, LiZY, XiangYZ, et al., 2022. Creating a dynamic quadrupedal robotic goalkeeper with reinforcement learning. arXiv:2210.04435. https://arxiv.org/abs/2210.04435 ![]() [10]HwangboJ, LeeJ, HutterM, 2018. Per-contact iteration method for solving contact dynamics. IEEE Robotics and Automation Letters, 3(2):895-902. ![]() [11]JiG, MunJ, KimH, et al., 2022. Concurrent training of a control policy and a state estimator for dynamic and robust legged locomotion. IEEE Robotics and Automation Letters, 7(2):4630-4637. ![]() [12]JinYB, LiuXW, ShaoYC, et al., 2022. High-speed quadrupedal locomotion by imitation-relaxation reinforcement learning. Nature Machine Intelligence, 4(12):1198-1208. ![]() [13]KangD, ZimmermannS, CorosS, 2021. Animal gaits on quadrupedal robots using motion matching and model-based control. IEEE/RSJ International Conference on Intelligent Robots and Systems, p.8500-8507. ![]() [14]LeeJ, HwangboJ, WellhausenL, et al., 2020. Learning quadrupedal locomotion over challenging terrain. Science Robotics, 5(47):eabc5986. ![]() [15]LiCH, VlastelicaM, BlaesS, et al., 2022. Learning agile skills via adversarial imitation of rough partial demonstrations. Proceedings of the 6th Conference on Robot Learning, p.342-352. ![]() [16]LingHY, ZinnoF, ChengG, et al., 2020. Character controllers using motion VAEs. ACM Transactions on Graphics, 39(4):40. ![]() [17]MikiT, LeeJ, HwangboJ, et al., 2022. Learning robust perceptive locomotion for quadrupedal robots in the wild. Science Robotics, 7(62):eabk2822. ![]() [18]PengXB, AbbeelP, LevineS, et al., 2018. DeepMimic: example-guided deep reinforcement learning of physics-based character skills. ACM Transactions on Graphics, 37(4):143. ![]() [19]PengXB, ChangM, ZhangG, et al., 2019. MCP: learning composable hierarchical control with multiplicative compositional policies. Proceedings of the 33rd International Conference on Neural Information Processing Systems, article 331. ![]() [20]PengXB, CoumansE, ZhangTN, et al., 2020. Learning agile robotic locomotion skills by imitating animals. Proceedings of the 14th Robotics: Science and Systems XVI. ![]() [21]PengXB, MaZ, AbbeelP, et al., 2021. AMP: adversarial motion priors for stylized physics-based character control. ACM Transactions on Graphics, 40(4):144. ![]() [22]PengXB, GuoYR, HalperL, et al., 2022. ASE: large-scale reusable adversarial skill embeddings for physically simulated characters. ACM Transactions on Graphics, 41(4):94. ![]() [23]SchulmanJ, WolskiF, DhariwalP, et al., 2017. Proximal policy optimization algorithms. arXiv:1707.06347. https://arxiv.org/abs/1707.06347 ![]() [24]ShaoYS, JinYB, LiuXW, et al., 2022. Learning free gait transition for quadruped robots via phase-guided controller. IEEE Robotics and Automation Letters, 7(2):1230-1237. ![]() [25]SiekmannJ, ValluriS, DaoJ, et al., 2020. Learning memory-based control for human-scale bipedal locomotion. Proceedings of the 14th Robotics: Science and Systems XVI. ![]() [26]SiekmannJ, GreenK, WarilaJ, et al., 2021a. Blind bipedal stair traversal via sim-to-real reinforcement learning. Proceedings of the 14th Robotics: Science and Systems XVII. ![]() [27]SiekmannJ, GodseY, FernA, et al., 2021b. Sim-to-real learning of all common bipedal gaits via periodic reward composition. IEEE International Conference on Robotics and Automation, p.7309-7315. ![]() [28]StarkeS, ZhangH, KomuraT, et al., 2019. Neural state machine for character-scene interactions. ACM Transactions on Graphics, 38(6):209. ![]() [29]StarkeS, MasonI, KomuraT, 2022. DeepPhase: periodic autoencoders for learning motion phase manifolds. ACM Transactions on Graphics, 41(4):136. ![]() [30]VollenweiderE, BjelonicM, KlemmV, et al., 2022. Advanced skills through multiple adversarial motion priors in reinforcement learning. arXiv:2203.14912. https://arxiv.org/abs/2203.14912 ![]() [31]XieZM, ClaryP, DaoJ, et al., 2019. Learning locomotion skills for cassie: iterative design and sim-to-real. Proceedings of the 3rd Annual Conference on Robot Learning, p.317-329. ![]() [32]ZhangH, StarkeS, KomuraT, et al., 2018. Mode-adaptive neural networks for quadruped motion control. ACM Transactions on Graphics, 37(4):145. ![]() Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou
310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn Copyright © 2000 - 2025 Journal of Zhejiang University-SCIENCE |
Open peer comments: Debate/Discuss/Question/Opinion
<1>