Journal of Zhejiang University

Journal of Zhejiang University SCIENCE A 2026 Vol.27 No.3 P.246-261

Three-degree-of-freedom motion posture stabilization control of platform based on DTW-LSTM-MATD3 under high and low frequency disturbances of ships

Author(s): Qin ZHANG, Jingyi ZHOU, Bangping GU, Xiong HU
Affiliation(s): 1. School of Logistics Engineering, Shanghai Maritime University, Shanghai 201306, China
Corresponding email(s): huxiong@shmtu.edu.cn
Key Words: Compensation control, Multi-agent twin delayed deep deterministic policy gradient (MATD3) algorithm, Dynamic time warping (DTW) algorithm, Long short-term memory (LSTM) network

Share this article to： More <<< Previous Article \|Next Article >>>

Qin ZHANG, Jingyi ZHOU, Bangping GU, Xiong HU. Three-degree-of-freedom motion posture stabilization control of platform based on DTW-LSTM-MATD3 under high and low frequency disturbances of ships[J]. Journal of Zhejiang University Science A, 2026, 27(3): 246-261.

@article{title="Three-degree-of-freedom motion posture stabilization control of platform based on DTW-LSTM-MATD3 under high and low frequency disturbances of ships",
author="Qin ZHANG, Jingyi ZHOU, Bangping GU, Xiong HU",
journal="Journal of Zhejiang University Science A",
volume="27",
number="3",
pages="246-261",
year="2026",
publisher="Zhejiang University Press & Springer",
doi="10.1631/jzus.A2500146"
}

%0 Journal Article
%T Three-degree-of-freedom motion posture stabilization control of platform based on DTW-LSTM-MATD3 under high and low frequency disturbances of ships
%A Qin ZHANG
%A Jingyi ZHOU
%A Bangping GU
%A Xiong HU
%J Journal of Zhejiang University SCIENCE A
%V 27
%N 3
%P 246-261
%@ 1673-565X
%D 2026
%I Zhejiang University Press & Springer
%DOI 10.1631/jzus.A2500146

TY - JOUR
T1 - Three-degree-of-freedom motion posture stabilization control of platform based on DTW-LSTM-MATD3 under high and low frequency disturbances of ships
A1 - Qin ZHANG
A1 - Jingyi ZHOU
A1 - Bangping GU
A1 - Xiong HU
J0 - Journal of Zhejiang University Science A
VL - 27
IS - 3
SP - 246
EP - 261
%@ 1673-565X
Y1 - 2026
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/jzus.A2500146

Abstract
Chinese Summary
Academic Network
Reviewer Comment

Abstract: In the complex and variable deep-sea environment, the compensation control of ship motion ensures the safety and efficiency of equipment installation and transportation in offshore wind farms. However, the ship motion posture compensation control system is severely affected by uncertainties, which significantly impact the accuracy of compensation control. In this paper, we propose a ship three-degree-of-freedom (3-DoF) motion posture stabilization control method based on the DTW-LSTM-MATD3 algorithm. We use the multi-agent twin delayed deep deterministic policy gradient (MATD3) to control a platform with six electric cylinders to achieve stable control. However, owing to random noise affecting the ship’s motion posture, we use a dynamic time warping (DTW) algorithm to distinguish between high-frequency noise and low-frequency tracking signals. Further, we embed a long short-term memory (LSTM) network into the MATD3 network to better align the Critic network’s training with the true Q-value. We use a combined reward function to enhance the agent’s exploration capability in complex dynamic environments. Finally, verification was conducted under sixth-level, abrupt sea conditions with high-frequency noise, as well as under real abrupt sea conditions, and a generalization test was also carried out. Simulation results show that the proposed DTW-LSTM-MATD3 method has great compensation control ability.

基于DTW-LSTM-MATD3的船舶高低频干扰下平台三自由度运动姿态稳定控制

作者：张琴，周静宜，顾邦平，胡雄
机构：上海海事大学，物流工程学院，中国上海，201306
目的：在复杂多变的深海环境中，船舶运动姿态补偿控制系统受到不确定性的严重影响，显著降低补偿控制的精度。本文旨在提升船舶运动补偿控制精度，确保海上风电场设备安装与运输的安全性和效率。
创新点：1.通过动态规整算法（DTW），区分高频噪声和低频跟踪信号；2.将长短期记忆（LSTM）网络嵌入到多智能体双延迟深度确定性策略梯度（MATD3）算法中，更好地使critic网络的训练与真实Q值接近；3.采用组合奖励函数提高智能体的探索能力。
方法：1.针对复杂海况下的船舶三自由度运动，构建船舶补偿系统的强化学习环境；2.运用DTW算法区分噪声信号，采用MATD3算法训练模型，并结合LSTM网络和组合奖励函数，提高传统MATD3的补偿效率；3.通过仿真模拟，补偿系统在六级海况、突变海况及真实含噪声情况下均实现更高的补偿效率，验证所提方法的有效性。
结论：1.DTW算法能确定高低频噪声信号分界点，使补偿系统实现高频抗噪和低频信号的跟踪运动；2. MATD3算法中采用LSTM神经网络和组合奖励函数，能提升智能体的训练效果和决策能力；3.运用所设计方法能够提高船舶三自由度补偿的泛化性和补偿效率。

关键词：补偿控制；MATD3算法；DTW算法；LSTM网络

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]CauzM, WyrschN, PerretL, et al., 2025. Embracing wind power in the solar PV-dominated Swiss landscape. Energy Reports, 13:3341-3350.

[2]HouY, HanGJ, ZhangF, et al., 2024. Distributional soft actor-critic-based multi-AUV cooperative pursuit for maritime security protection. IEEE Transactions on Intelligent Transportation Systems, 25(6):6049-6060.

[3]JimohIA, KüçükdemiralIB, BevanG, 2021. Fin control for ship roll motion stabilisation based on observer enhanced MPC with disturbance rate compensation. Ocean Engineering, 224:108706.

[4]KangJC, SunLP, Guedes SoaresC, 2019. Fault tree analysis of floating offshore wind turbines. Renewable Energy, 133:1455-1467.

[5]LiC, MogollónJM, TukkerA, et al., 2022. Environmental impacts of global offshore wind energy development until 2040. Environmental Science & Technology, 56(16):11567-11577.

[6]LiuXX, YinY, SuYZ, et al., 2022. A multi-UCAV cooperative decision-making method based on an MAPPO algorithm for beyond-visual-range air combat. Aerospace, 9(10):563.

[7]LvYY, LiH, 2023. Strong fixed-time dynamic inverse adaptive LQR integrated control strategy for dynamic positioning of ship. Ocean Engineering, 288:115969.

[8]MöllerströmE, GipeP, OttermoF, 2025. Wind power development: a historical review. Wind Engineering, 49(2):499-512.

[9]MouZY, ZhangY, GaoFF, et al., 2021. Deep reinforcement learning based three-dimensional area coverage with UAV swarm. IEEE Journal on Selected Areas in Communications, 39(10):3160-3176.

[10]NathwaniJ, KammenDM, 2019. Affordable energy for humanity: a global movement to support universal clean energy access. Proceedings of the IEEE, 107(9):1780-1789.

[11]QinYH, ZhangZS, LiXL, et al., 2023. Deep reinforcement learning based resource allocation and trajectory planning in integrated sensing and communications UAV network. IEEE Transactions on Wireless Communications, 22(11):8158-8169.

[12]ShaoS, LiuHW, ZhangL, et al., 2022. Integration of super-resolution ISAR imaging and fine motion compensation for complex maneuvering ship targets under high sea state. IEEE Transactions on Geoscience and Remote Sensing, 60:5222820.

[13]TangG, LeiJM, LiFR, et al., 2023. A modified 6-DOF hybrid serial–parallel platform for ship wave compensation. Ocean Engineering, 280:114336.

[14]WangWX, NingYH, ZhangY, et al., 2025. Linear active disturbance rejection control with linear quadratic regulator for Stewart platform in active wave compensation system. Applied Ocean Research, 156:104469.

[15]WangYF, ZhaoY, 2025. Multiple ships cooperative navigation and collision avoidance using multi-agent reinforcement learning with communication. Ocean Engineering, 320:120244.

[16]WinursitoA, PratamaGNP, 2021. LQR state feedback controller with precompensator for magnetic levitation system. Journal of Physics: Conference Series, 2111(1):012004.

[17]WoodacreJK, BauerRJ, IraniR, 2018. Hydraulic valve-based active-heave compensation using a model-predictive controller with non-linear valve compensations. Ocean Engineering, 152:47-56.

[18]WuLS, ZhangC, ZhangB, et al., 2025. Toward energy-efficiency: integrating MATD3 reinforcement learning method for computational offloading in RIS-aided UAV-MEC environments. IEEE Internet of Things Journal, 12(14):26582-26595.

[19]YanF, FanK, YanXC, et al., 2020. Constant tension control of hybrid active-passive heave compensator based on adaptive integral sliding mode method. IEEE Access, 8:103782-103791.

[20]ZhangQ, DingZY, ZhangMJ, 2020. Adaptive self-regulation PID control of course-keeping for ships. Polish Maritime Research, 27(1):39-45.

[21]ZhangYH, LiGX, TianY, et al., 2025. Model-free reinforcement learning-based transient power control of vehicle fuel cell systems. Applied Energy, 388:125614.

[22]ZhaoEY, ZhouN, LiuCJ, et al., 2024. Time-aware MADDPG with LSTM for multi-agent obstacle avoidance: a comparative study. Complex & Intelligent Systems, 10(3):4141-4155.

[23]ZhouYT, KongXR, LinKP, et al., 2024. Novel task decomposed multi-agent twin delayed deep deterministic policy gradient algorithm for multi-UAV autonomous path planning. Knowledge-Based Systems, 287:111462.

Open peer comments: Debate/Discuss/Question/Opinion

<1>