JZUS - Journal of Zhejiang University SCIENCE

Journal of Zhejiang University SCIENCE A 2022 Vol.23 No.6 P.458-478

Towards autonomous and optimal excavation of shield machine: a deep reinforcement learning-based approach

Author(s): Ya-kun ZHANG, Guo-fang GONG, Hua-yong YANG, Yu-xi CHEN, Geng-lin CHEN
Affiliation(s): State Key Laboratory of Fluid Power and Mechatronic Systems, Zhejiang University, Hangzhou 310027, China; more
Corresponding email(s): gfgong@zju.edu.cn
Key Words: Shield machine, Slurry shield, Intelligent tunnel boring machine (TBM), Deep reinforcement learning, Optimal control, Dynamic optimization, Deep learning

Share this article to： More <<< Previous Article \|Next Article >>>

Ya-kun ZHANG, Guo-fang GONG, Hua-yong YANG, Yu-xi CHEN, Geng-lin CHEN. Towards autonomous and optimal excavation of shield machine: a deep reinforcement learning-based approach[J]. Journal of Zhejiang University Science A, 2022, 23(6): 458-478.

@article{title="Towards autonomous and optimal excavation of shield machine: a deep reinforcement learning-based approach",
author="Ya-kun ZHANG, Guo-fang GONG, Hua-yong YANG, Yu-xi CHEN, Geng-lin CHEN",
journal="Journal of Zhejiang University Science A",
volume="23",
number="6",
pages="458-478",
year="2022",
publisher="Zhejiang University Press & Springer",
doi="10.1631/jzus.A2100325"
}

%0 Journal Article
%T Towards autonomous and optimal excavation of shield machine: a deep reinforcement learning-based approach
%A Ya-kun ZHANG
%A Guo-fang GONG
%A Hua-yong YANG
%A Yu-xi CHEN
%A Geng-lin CHEN
%J Journal of Zhejiang University SCIENCE A
%V 23
%N 6
%P 458-478
%@ 1673-565X
%D 2022
%I Zhejiang University Press & Springer
%DOI 10.1631/jzus.A2100325

TY - JOUR
T1 - Towards autonomous and optimal excavation of shield machine: a deep reinforcement learning-based approach
A1 - Ya-kun ZHANG
A1 - Guo-fang GONG
A1 - Hua-yong YANG
A1 - Yu-xi CHEN
A1 - Geng-lin CHEN
J0 - Journal of Zhejiang University Science A
VL - 23
IS - 6
SP - 458
EP - 478
%@ 1673-565X
Y1 - 2022
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/jzus.A2100325

Abstract
Chinese Summary
Academic Network
Reviewer Comment

Abstract: Autonomous excavation operation is a major trend in the development of a new generation of intelligent tunnel boring machines (TBMs). However, existing technologies are limited to supervised machine learning and static optimization, which cannot outperform human operation and deal with ever changing geological conditions and the long-term performance measure. The aim of this study is to resolve the problem of dynamic optimization of the shield excavation performance, as well as to achieve autonomous optimal excavation. In this study, a novel autonomous optimal excavation approach that integrates deep reinforcement learning and optimal control is proposed for shield machines. Based on a first-principles analysis of the machine-ground interaction dynamics of the excavation process, a deep neural network model is developed using construction field data consisting of 1.1 million samples. The multi-system coupling mechanism is revealed by establishing an overall system model. Based on the overall system analysis, the autonomous optimal excavation problem is decomposed into a multi-objective dynamic optimization problem and an optimal control problem. Subsequently, a dimensionless multi-objective comprehensive excavation performance measure is proposed. A deep reinforcement learning method is used to solve for the optimal action sequence trajectory, and optimal closed-loop feedback controllers are designed to achieve accurate execution. The performance of the proposed approach is compared to that of human operation by using the construction field data. The simulation results show that the proposed approach not only has the potential to replace human operation but also can significantly improve the comprehensive excavation performance.

迈向盾构机自主最优掘进：一种基于深度强化学习的方法

作者：张亚坤¹，龚国芳¹，杨华勇¹，陈玉羲¹，陈更林²
机构：¹浙江大学，流体动力与机电系统国家重点实验室，中国杭州，310027；²中国矿业大学，电气与动力工程学院，中国徐州，221116
目的：自主掘进作业是新一代智能隧道掘进机（TBM）发展的趋势。然而，现有技术局限于有监督机器学习和静态优化，其性能无法超越人工操作，也难以处理不断变化的地质条件和长期掘进性能指标。本文旨在解决盾构机掘进性能的动态优化问题，实现自主最优掘进。
创新点：1.针对掘进过程的盾构机-环境交互作用动力学，提出了一种基于第一性原理分析和深度神经网络相结合的高精度混合建模方法，改善模型的可解释性并简化了特征选择过程；2.提出了一种适用于盾构机智能操作系统的无量纲多目标综合掘进性能指标；3.提出了一种深度学习与最优控制结合的盾构自主最优掘进方法，实现盾构掘进参数的智能决策与长期综合掘进性能的多目标动态优化。
方法：1.通过理论推导，揭示掘进过程的多系统耦合作用关系，得到自主最优掘进系统设计的两个自由度（图8）；2.通过机理与数据联合驱动的混合建模，构建深度强化学习智能体的高精度训练环境；3.通过仿真模拟，利用施工现场数据，对自主最优掘进系统与人工操作的性能进行比较，验证所提方法的可行性和有效性（图11~13）。
结论：1.人类司机在进行掘进参数决策时，掘进比速度和掘进比能耗的相对权重比接近6:4。2.不同的地质条件应采用不同的掘进参数决策策略：常规地质应采用k₁值较高的自主最优掘进系统，而在掘进比速度明显降低的困难地质则应采用k₂值较高的自主最优掘进系统。3.尽管训练深度强化学习智能体非常耗时，但与培训熟练的盾构司机相比仍具有巨大的优势。

关键词：盾构机；泥水盾构；TBM；智能盾构/TBM；深度强化学习；最优控制；动态优化；深度学习

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]AntsaklisPJ, RahnamaA, 2018. Control and machine intelligence for system autonomy. Journal of Intelligent & Robotic Systems, 91(1):23-34.

[2]AntsaklisPJ, PassinoKM, WangSJ, 1991. An introduction to autonomous control systems. IEEE Control Systems Magazine, 11(4):5-13.

[3]AtesU, BilginN, CopurH, 2014. Estimating torque, thrust and other design parameters of different type TBMs with some criticism to TBMs used in Turkish tunneling projects. Tunnelling and Underground Space Technology, 40:46-63.

[4]BusoniuL, BabuskaR, de SchutterB, et al., 2017. Reinforcement Learning and Dynamic Programming Using Function Approximators. CRC Press, Boca Raton, USA, p.1-13.

[5]CarrerasM, YuhJ, BatlleJ, et al., 2005. A behavior-based scheme using reinforcement learning for autonomous underwater vehicles. IEEE Journal of Oceanic Engineering, 30(2):416-427.

[6]ChenRP, ZhangP, KangX, et al., 2019. Prediction of maximum surface settlement caused by earth pressure balance (EPB) shield tunneling with ANN methods. Soils and Foundations, 59(2):284-295.

[7]CobbeK, KlimovO, HesseC, et al., 2019. Quantifying generalization in reinforcement learning. Proceedings of the 36th International Conference on Machine Learning, p.1282-1289.

[8]DietterichTG, 2000. Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research, 13:227-303.

[9]El SallabA, AbdouM, PerotE, et al., 2017. Deep reinforcement learning framework for autonomous driving. Electronic Imaging, 2017(19):70-76.

[10]GengQ, WeiZY, HeF, et al., 2015. Comparison of the mechanical performance between two-stage and flat-face cutter head for the rock tunnel boring machine (TBM). Journal of Mechanical Science and Technology, 29(5):2047-2058.

[11]HanMD, CaiZX, QuCY, et al., 2017. Dynamic numerical simulation of cutterhead loads in TBM tunnelling. Tunnelling and Underground Space Technology, 70:286-298.

[12]HeKM, ZhangXY, RenSQ, et al., 2015. Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. 2015 IEEE International Conference on Computer Vision (ICCV), p.1026-1034.

[13]HuoJZ, SunW, ChenJ, et al., 2010. Optimal disc cutters plane layout design of the full-face rock tunnel boring machine (TBM) based on a multi-objective genetic algorithm. Journal of Mechanical Science and Technology, 24(2):521-528.

[14]KingmaDP, BaJ, 2015. Adam: a method for stochastic optimization. The 3rd International Conference on Learning Representations.

[15]KoopialipoorM, NikoueiSS, MartoA, et al., 2019. Predicting tunnel boring machine performance through a new model based on the group method of data handling. Bulletin of Engineering Geology and the Environment, 78(5):‍‍3799-3813.

[16]KuwaharaH, HaradaM, 1988. Application of fuzzy reasoning to the control of shield tunnelling. Journal of the Society of Instrument and Control Engineers, 27(11):1030-1037.

[17]LillicrapTP, HuntJJ, PritzelA, et al., 2016. Continuous control with deep reinforcement learning. The 4th International Conference on Learning Representations.

[18]LiuXY, ShaoC, MaHF, et al., 2011. Optimal earth pressure balance control for shield tunneling based on LS-SVM and PSO. Automation in Construction, 20(4):321-327.

[19]MahdevariS, ShahriarK, YagizS, et al., 2014. A support vector regression model for predicting tunnel boring machine penetration rates. International Journal of Rock Mechanics and Mining Sciences, 72:214-229.

[20]NamliM, BilginN, 2017. A model to predict daily advance rates of EPB-TBMs in a complex geology in Istanbul. Tunnelling and Underground Space Technology, 62:43-52.

[21]NgAY, CoatesA, DielM, et al., 2006. Autonomous inverted helicopter flight via reinforcement learning. In: Ang MH, Khatib O (Eds.), Experimental Robotics IX. Springer, Berlin, Heidelberg, Germany, p.363-372.

[22]NinićJ, MeschkeG, 2015. Model update and real-time steering of tunnel boring machines using simulation-based meta models. Tunnelling and Underground Space Technology, 45:138-152.

[23]PanXL, YouYR, WangZY, et al., 2017. Virtual to real reinforcement learning for autonomous driving. British Machine Vision Conference.

[24]QinCJ, ShiG, TaoJF, et al., 2021. Precise cutterhead torque prediction for shield tunneling machines using a novel hybrid deep neural network. Mechanical Systems and Signal Processing, 151:107386.

[25]SalimiA, FaradonbehRS, MonjeziM, et al., 2018. TBM performance estimation using a classification and regression tree (CART) technique. Bulletin of Engineering Geology and the Environment, 77(1):429-440.

[26]SaridisGN, 2001. Hierarchically Intelligent Machines. World Scientific, Hong Kong, China, p.‍25-32.

[27]Shalev-ShwartzS, ShammahS, ShashuaA, 2016. Safe, multi-agent, reinforcement learning for autonomous driving. https://arxiv.org/abs/1610.03295v1

[28]ShaoC, LanDS, 2014. Optimal control of an earth pressure balance shield with tunnel face stability. Automation in Construction, 46:22-29.

[29]ShiH, YangHY, GongGF, et al., 2011. Determination of the cutterhead torque for EPB shield tunneling machine. Automation in Construction, 20(8):1087-1095.

[30]SongX, LiuJQ, GuoW, 2010. A cutter head torque forecast model based on multivariate nonlinear regression for EPB shield tunneling. International Conference on Artificial Intelligence and Computational Intelligence, p.104-108.

[31]SunW, HuoJZ, ChenJ, et al., 2011. Disc cutters‍‍’layout design of the full-face rock tunnel boring machine (TBM) using a cooperative coevolutionary algorithm. Journal of Mechanical Science and Technology, 25(2):415.

[32]SunW, ShiML, ZhangC, et al., 2018a. Dynamic load prediction of tunnel boring machine (TBM) based on heterogeneous in-situ data. Automation in Construction, 92:23-34.

[33]SunW, WangXB, ShiML, et al., 2018b. Multidisciplinary design optimization of hard rock tunnel boring machine using collaborative optimization. Advances in Mechanical Engineering, 10(1):1-12.

[34]WangLT, GongGF, ShiH, et al., 2012. A new calculation model of cutterhead torque and investigation of its influencing factors. Science China Technological Sciences, 55(6):1581-1588.

[35]WangLT, SunW, LongYY, et al., 2018a. Reliability-based performance optimization of tunnel boring machine considering geological uncertainties. IEEE Access, 6:19086-19098.

[36]WangLT, YangX, GongGF, et al., 2018b. Pose and trajectory control of shield tunneling machine in complicated stratum. Automation in Construction, 93:192-199.

[37]XieHB, DuanXM, YangHY, et al., 2012. Automatic trajectory tracking control of shield tunneling machine under complex stratum working condition. Tunnelling and Underground Space Technology, 32:87-97.

[38]YehIC, 1997. Application of neural networks to automatic soil pressure balance control for shield tunneling. Automation in Construction, 5(5):421-426.

[39]YuA, Palefsky-SmithR, BediR, 2016. Deep Reinforcement Learning for Simulated Autonomous Vehicle Control. Technical Report, Stanford University, California, USA.

[40]ZhangP, ChenRP, WuHN, 2019. Real-time analysis and regulation of EPB shield steering using Random Forest. Automation in Construction, 106:102860.

[41]ZhangP, WuHN, ChenRP, et al., 2020a. A critical evaluation of machine learning and deep learning in shield-ground interaction prediction. Tunnelling and Underground Space Technology, 106:103593.

[42]ZhangP, LiH, HaQP, et al., 2020b. Reinforcement learning based optimizer for improvement of predicting tunneling-induced ground responses. Advanced Engineering Informatics, 45:101097.

[43]ZhangQ, KangYL, QuCY, et al., 2010. Mechanical model for operational loads prediction on shield cutter head during excavation. IEEE/ASME International Conference on Advanced Intelligent Mechatronics, p.‍1252-1256.

[44]ZhangQ, HuangT, HuangGY, et al., 2013. Theoretical model for loads prediction on shield tunneling machine with consideration of soil-rock interbedded ground. Science China Technological Sciences, 56(9):2259-2267.

[45]ZhangQ, QuCY, CaiZX, et al., 2014. Modeling of the thrust and torque acting on shield machines during tunneling. Automation in Construction, 40:60-67.

[46]ZhangQ, HouZD, HuangGY, et al., 2015. Mechanical characterization of the load distribution on the cutterhead‍–ground interface of shield tunneling machines. Tunnelling and Underground Space Technology, 47:106-113.

[47]ZhangWJ, YangGS, LinYZ, et al., 2018. On definition of deep learning. World Automation Congress (WAC), p. 1-5.

[48]ZhangYK, GongGF, YangHY, et al., 2019. Data-driven direct automatic tuning scheme for fixed-structure digital controllers of hybrid systems. IET Control Theory & Applications, 13(2):248-257.

[49]ZhangYK, GongGF, YangHY, et al., 2020. Precision versus intelligence: autonomous supporting pressure balance control for slurry shield tunnel boring machines. Automation in Construction, 114:103173.

[50]ZhouC, DingLY, HeR, 2013. PSO-based Elman neural network model for predictive control of air chamber pressure in slurry shield tunneling under Yangtze River. Automation in Construction, 36:208-217.

[51]ZhouC, DingLY, SkibniewskiMJ, et al., 2018. Data based complex network modeling and analysis of shield tunneling performance in metro construction. Advanced Engineering Informatics, 38:168-186.

[52]ZhouC, XuHC, DingLY, et al., 2019a. Dynamic prediction for attitude and position in shield tunneling: a deep learning method. Automation in Construction, 105:102840.

[53]ZhouC, DingLY, ZhouY, et al., 2019b. Hybrid support vector machine optimization model for prediction of energy consumption of cutter head drives in shield tunneling. Journal of Computing in Civil Engineering, 33(3):04019019.

[54]ZhouJ, ZhouYH, WangBC, et al., 2019. Human‍–‍cyber‍–physical systems (HCPSs) in the context of new-generation intelligent manufacturing. Engineering, 5(4):624-636.

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Similar articles

- Go to

迈向盾构机自主最优掘进：一种基于深度强化学习的方法

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference