Full Text:   <575>

Summary:  <5>

Suppl. Mater.: 

CLC number: 

On-line Access: 2022-06-24

Received: 2021-07-15

Revision Accepted: 2021-12-07

Crosschecked: 2022-06-24

Cited: 0

Clicked: 814

Citations:  Bibtex RefMan EndNote GB/T7714


Guo-fang GONG


-   Go to

Article info.
Open peer comments

Journal of Zhejiang University SCIENCE A 2022 Vol.23 No.6 P.458-478


Towards autonomous and optimal excavation of shield machine: a deep reinforcement learning-based approach

Author(s):  Ya-kun ZHANG, Guo-fang GONG, Hua-yong YANG, Yu-xi CHEN, Geng-lin CHEN

Affiliation(s):  State Key Laboratory of Fluid Power and Mechatronic Systems, Zhejiang University, Hangzhou 310027, China; more

Corresponding email(s):   gfgong@zju.edu.cn

Key Words:  Shield machine, Slurry shield, Intelligent tunnel boring machine (TBM), Deep reinforcement learning, Optimal control, Dynamic optimization, Deep learning

Ya-kun ZHANG, Guo-fang GONG, Hua-yong YANG, Yu-xi CHEN, Geng-lin CHEN. Towards autonomous and optimal excavation of shield machine: a deep reinforcement learning-based approach[J]. Journal of Zhejiang University Science A, 2022, 23(6): 458-478.

@article{title="Towards autonomous and optimal excavation of shield machine: a deep reinforcement learning-based approach",
author="Ya-kun ZHANG, Guo-fang GONG, Hua-yong YANG, Yu-xi CHEN, Geng-lin CHEN",
journal="Journal of Zhejiang University Science A",
publisher="Zhejiang University Press & Springer",

%0 Journal Article
%T Towards autonomous and optimal excavation of shield machine: a deep reinforcement learning-based approach
%A Ya-kun ZHANG
%A Guo-fang GONG
%A Hua-yong YANG
%A Yu-xi CHEN
%A Geng-lin CHEN
%J Journal of Zhejiang University SCIENCE A
%V 23
%N 6
%P 458-478
%@ 1673-565X
%D 2022
%I Zhejiang University Press & Springer
%DOI 10.1631/jzus.A2100325

T1 - Towards autonomous and optimal excavation of shield machine: a deep reinforcement learning-based approach
A1 - Ya-kun ZHANG
A1 - Guo-fang GONG
A1 - Hua-yong YANG
A1 - Yu-xi CHEN
A1 - Geng-lin CHEN
J0 - Journal of Zhejiang University Science A
VL - 23
IS - 6
SP - 458
EP - 478
%@ 1673-565X
Y1 - 2022
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/jzus.A2100325

Autonomous excavation operation is a major trend in the development of a new generation of intelligent tunnel boring machines (TBMs). However, existing technologies are limited to supervised machine learning and static optimization, which cannot outperform human operation and deal with ever changing geological conditions and the long-term performance measure. The aim of this study is to resolve the problem of dynamic optimization of the shield excavation performance, as well as to achieve autonomous optimal excavation. In this study, a novel autonomous optimal excavation approach that integrates deep reinforcement learning and optimal control is proposed for shield machines. Based on a first-principles analysis of the machine-ground interaction dynamics of the excavation process, a deep neural network model is developed using construction field data consisting of 1.1 million samples. The multi-system coupling mechanism is revealed by establishing an overall system model. Based on the overall system analysis, the autonomous optimal excavation problem is decomposed into a multi-objective dynamic optimization problem and an optimal control problem. Subsequently, a dimensionless multi-objective comprehensive excavation performance measure is proposed. A deep reinforcement learning method is used to solve for the optimal action sequence trajectory, and optimal closed-loop feedback controllers are designed to achieve accurate execution. The performance of the proposed approach is compared to that of human operation by using the construction field data. The simulation results show that the proposed approach not only has the potential to replace human operation but also can significantly improve the comprehensive excavation performance.




Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article


[1]AntsaklisPJ, RahnamaA, 2018. Control and machine intelligence for system autonomy. Journal of Intelligent & Robotic Systems, 91(1):23-34.

[2]AntsaklisPJ, PassinoKM, WangSJ, 1991. An introduction to autonomous control systems. IEEE Control Systems Magazine, 11(4):5-13.

[3]AtesU, BilginN, CopurH, 2014. Estimating torque, thrust and other design parameters of different type TBMs with some criticism to TBMs used in Turkish tunneling projects. Tunnelling and Underground Space Technology, 40:46-63.

[4]BusoniuL, BabuskaR, de SchutterB, et al., 2017. Reinforcement Learning and Dynamic Programming Using Function Approximators. CRC Press, Boca Raton, USA, p.1-13.

[5]CarrerasM, YuhJ, BatlleJ, et al., 2005. A behavior-based scheme using reinforcement learning for autonomous underwater vehicles. IEEE Journal of Oceanic Engineering, 30(2):416-427.

[6]ChenRP, ZhangP, KangX, et al., 2019. Prediction of maximum surface settlement caused by earth pressure balance (EPB) shield tunneling with ANN methods. Soils and Foundations, 59(2):284-295.

[7]CobbeK, KlimovO, HesseC, et al., 2019. Quantifying generalization in reinforcement learning. Proceedings of the 36th International Conference on Machine Learning, p.1282-1289.

[8]DietterichTG, 2000. Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research, 13:227-303.

[9]El SallabA, AbdouM, PerotE, et al., 2017. Deep reinforcement learning framework for autonomous driving. Electronic Imaging, 2017(19):70-76.

[10]GengQ, WeiZY, HeF, et al., 2015. Comparison of the mechanical performance between two-stage and flat-face cutter head for the rock tunnel boring machine (TBM). Journal of Mechanical Science and Technology, 29(5):2047-2058.

[11]HanMD, CaiZX, QuCY, et al., 2017. Dynamic numerical simulation of cutterhead loads in TBM tunnelling. Tunnelling and Underground Space Technology, 70:286-298.

[12]HeKM, ZhangXY, RenSQ, et al., 2015. Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. 2015 IEEE International Conference on Computer Vision (ICCV), p.1026-1034.

[13]HuoJZ, SunW, ChenJ, et al., 2010. Optimal disc cutters plane layout design of the full-face rock tunnel boring machine (TBM) based on a multi-objective genetic algorithm. Journal of Mechanical Science and Technology, 24(2):521-528.

[14]KingmaDP, BaJ, 2015. Adam: a method for stochastic optimization. The 3rd International Conference on Learning Representations.

[15]KoopialipoorM, NikoueiSS, MartoA, et al., 2019. Predicting tunnel boring machine performance through a new model based on the group method of data handling. Bulletin of Engineering Geology and the Environment, 78(5):‍‍3799-3813.

[16]KuwaharaH, HaradaM, 1988. Application of fuzzy reasoning to the control of shield tunnelling. Journal of the Society of Instrument and Control Engineers, 27(11):1030-1037.

[17]LillicrapTP, HuntJJ, PritzelA, et al., 2016. Continuous control with deep reinforcement learning. The 4th International Conference on Learning Representations.

[18]LiuXY, ShaoC, MaHF, et al., 2011. Optimal earth pressure balance control for shield tunneling based on LS-SVM and PSO. Automation in Construction, 20(4):321-327.

[19]MahdevariS, ShahriarK, YagizS, et al., 2014. A support vector regression model for predicting tunnel boring machine penetration rates. International Journal of Rock Mechanics and Mining Sciences, 72:214-229.

[20]NamliM, BilginN, 2017. A model to predict daily advance rates of EPB-TBMs in a complex geology in Istanbul. Tunnelling and Underground Space Technology, 62:43-52.

[21]NgAY, CoatesA, DielM, et al., 2006. Autonomous inverted helicopter flight via reinforcement learning. In: Ang MH, Khatib O (Eds.), Experimental Robotics IX. Springer, Berlin, Heidelberg, Germany, p.363-372.

[22]NinićJ, MeschkeG, 2015. Model update and real-time steering of tunnel boring machines using simulation-based meta models. Tunnelling and Underground Space Technology, 45:138-152.

[23]PanXL, YouYR, WangZY, et al., 2017. Virtual to real reinforcement learning for autonomous driving. British Machine Vision Conference.

[24]QinCJ, ShiG, TaoJF, et al., 2021. Precise cutterhead torque prediction for shield tunneling machines using a novel hybrid deep neural network. Mechanical Systems and Signal Processing, 151:107386.

[25]SalimiA, FaradonbehRS, MonjeziM, et al., 2018. TBM performance estimation using a classification and regression tree (CART) technique. Bulletin of Engineering Geology and the Environment, 77(1):429-440.

[26]SaridisGN, 2001. Hierarchically Intelligent Machines. World Scientific, Hong Kong, China, p.‍25-32.

[27]Shalev-ShwartzS, ShammahS, ShashuaA, 2016. Safe, multi-agent, reinforcement learning for autonomous driving. https://arxiv.org/abs/1610.03295v1

[28]ShaoC, LanDS, 2014. Optimal control of an earth pressure balance shield with tunnel face stability. Automation in Construction, 46:22-29.

[29]ShiH, YangHY, GongGF, et al., 2011. Determination of the cutterhead torque for EPB shield tunneling machine. Automation in Construction, 20(8):1087-1095.

[30]SongX, LiuJQ, GuoW, 2010. A cutter head torque forecast model based on multivariate nonlinear regression for EPB shield tunneling. International Conference on Artificial Intelligence and Computational Intelligence, p.104-108.

[31]SunW, HuoJZ, ChenJ, et al., 2011. Disc cutters‍‍’layout design of the full-face rock tunnel boring machine (TBM) using a cooperative coevolutionary algorithm. Journal of Mechanical Science and Technology, 25(2):415.

[32]SunW, ShiML, ZhangC, et al., 2018a. Dynamic load prediction of tunnel boring machine (TBM) based on heterogeneous in-situ data. Automation in Construction, 92:23-34.

[33]SunW, WangXB, ShiML, et al., 2018b. Multidisciplinary design optimization of hard rock tunnel boring machine using collaborative optimization. Advances in Mechanical Engineering, 10(1):1-12.

[34]WangLT, GongGF, ShiH, et al., 2012. A new calculation model of cutterhead torque and investigation of its influencing factors. Science China Technological Sciences, 55(6):1581-1588.

[35]WangLT, SunW, LongYY, et al., 2018a. Reliability-based performance optimization of tunnel boring machine considering geological uncertainties. IEEE Access, 6:19086-19098.

[36]WangLT, YangX, GongGF, et al., 2018b. Pose and trajectory control of shield tunneling machine in complicated stratum. Automation in Construction, 93:192-199.

[37]XieHB, DuanXM, YangHY, et al., 2012. Automatic trajectory tracking control of shield tunneling machine under complex stratum working condition. Tunnelling and Underground Space Technology, 32:87-97.

[38]YehIC, 1997. Application of neural networks to automatic soil pressure balance control for shield tunneling. Automation in Construction, 5(5):421-426.

[39]YuA, Palefsky-SmithR, BediR, 2016. Deep Reinforcement Learning for Simulated Autonomous Vehicle Control. Technical Report, Stanford University, California, USA.

[40]ZhangP, ChenRP, WuHN, 2019. Real-time analysis and regulation of EPB shield steering using Random Forest. Automation in Construction, 106:102860.

[41]ZhangP, WuHN, ChenRP, et al., 2020a. A critical evaluation of machine learning and deep learning in shield-ground interaction prediction. Tunnelling and Underground Space Technology, 106:103593.

[42]ZhangP, LiH, HaQP, et al., 2020b. Reinforcement learning based optimizer for improvement of predicting tunneling-induced ground responses. Advanced Engineering Informatics, 45:101097.

[43]ZhangQ, KangYL, QuCY, et al., 2010. Mechanical model for operational loads prediction on shield cutter head during excavation. IEEE/ASME International Conference on Advanced Intelligent Mechatronics, p.‍1252-1256.

[44]ZhangQ, HuangT, HuangGY, et al., 2013. Theoretical model for loads prediction on shield tunneling machine with consideration of soil-rock interbedded ground. Science China Technological Sciences, 56(9):2259-2267.

[45]ZhangQ, QuCY, CaiZX, et al., 2014. Modeling of the thrust and torque acting on shield machines during tunneling. Automation in Construction, 40:60-67.

[46]ZhangQ, HouZD, HuangGY, et al., 2015. Mechanical characterization of the load distribution on the cutterhead‍–ground interface of shield tunneling machines. Tunnelling and Underground Space Technology, 47:106-113.

[47]ZhangWJ, YangGS, LinYZ, et al., 2018. On definition of deep learning. World Automation Congress (WAC), p. 1-5.

[48]ZhangYK, GongGF, YangHY, et al., 2019. Data-driven direct automatic tuning scheme for fixed-structure digital controllers of hybrid systems. IET Control Theory & Applications, 13(2):248-257.

[49]ZhangYK, GongGF, YangHY, et al., 2020. Precision versus intelligence: autonomous supporting pressure balance control for slurry shield tunnel boring machines. Automation in Construction, 114:103173.

[50]ZhouC, DingLY, HeR, 2013. PSO-based Elman neural network model for predictive control of air chamber pressure in slurry shield tunneling under Yangtze River. Automation in Construction, 36:208-217.

[51]ZhouC, DingLY, SkibniewskiMJ, et al., 2018. Data based complex network modeling and analysis of shield tunneling performance in metro construction. Advanced Engineering Informatics, 38:168-186.

[52]ZhouC, XuHC, DingLY, et al., 2019a. Dynamic prediction for attitude and position in shield tunneling: a deep learning method. Automation in Construction, 105:102840.

[53]ZhouC, DingLY, ZhouY, et al., 2019b. Hybrid support vector machine optimization model for prediction of energy consumption of cutter head drives in shield tunneling. Journal of Computing in Civil Engineering, 33(3):04019019.

[54]ZhouJ, ZhouYH, WangBC, et al., 2019. Human‍–‍cyber‍–physical systems (HCPSs) in the context of new-generation intelligent manufacturing. Engineering, 5(4):624-636.

Open peer comments: Debate/Discuss/Question/Opinion


Please provide your name, email address and a comment

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - 2022 Journal of Zhejiang University-SCIENCE