Full Text:   <7>

Summary:  <4>

CLC number: TP393

On-line Access: 2026-01-09

Received: 2025-02-16

Revision Accepted: 2025-10-17

Crosschecked: 2026-01-11

Cited: 0

Clicked: 8

Citations:  Bibtex RefMan EndNote GB/T7714

 ORCID:

Shicheng ZHOU

https://orcid.org/0000-0001-9686-3836

Jingju LIU

https://orcid.org/0009-0005-9506-6903

Yuliang LU

https://orcid.org/0000-0002-8502-9907

Yue ZHANG

https://orcid.org/0009-0007-3570-2132

-   Go to

Article info.
Open peer comments

Frontiers of Information Technology & Electronic Engineering  2025 Vol.26 No.12 P.2511-2528

http://doi.org/10.1631/FITEE.2500100


Mind the Gap: towards generalizable autonomous penetration testing via domain randomization and meta-reinforcement learning


Author(s):  Shicheng ZHOU, Jingju LIU, Yuliang LU, Jiahai YANG, Yue ZHANG, Jie CHEN

Affiliation(s):  College of Electronic Engineering, National University of Defense Technology, Hefei 230037, China; more

Corresponding email(s):   zhoushicheng@nudt.edu.cn, liujingju17@nudt.edu.cn, luyuliang@nudt.edu.cn, zhangyue@nudt.edu.cn

Key Words:  Cybersecurity, Penetration testing, Reinforcement learning, Domain randomization, Meta-reinforcement learning, Large language model


Shicheng ZHOU, Jingju LIU, Yuliang LU, Jiahai YANG, Yue ZHANG, Jie CHEN. Mind the Gap: towards generalizable autonomous penetration testing via domain randomization and meta-reinforcement learning[J]. Frontiers of Information Technology & Electronic Engineering, 2025, 26(12): 2511-2528.

@article{title="Mind the Gap: towards generalizable autonomous penetration testing via domain randomization and meta-reinforcement learning",
author="Shicheng ZHOU, Jingju LIU, Yuliang LU, Jiahai YANG, Yue ZHANG, Jie CHEN",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="26",
number="12",
pages="2511-2528",
year="2025",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2500100"
}

%0 Journal Article
%T Mind the Gap: towards generalizable autonomous penetration testing via domain randomization and meta-reinforcement learning
%A Shicheng ZHOU
%A Jingju LIU
%A Yuliang LU
%A Jiahai YANG
%A Yue ZHANG
%A Jie CHEN
%J Frontiers of Information Technology & Electronic Engineering
%V 26
%N 12
%P 2511-2528
%@ 2095-9184
%D 2025
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2500100

TY - JOUR
T1 - Mind the Gap: towards generalizable autonomous penetration testing via domain randomization and meta-reinforcement learning
A1 - Shicheng ZHOU
A1 - Jingju LIU
A1 - Yuliang LU
A1 - Jiahai YANG
A1 - Yue ZHANG
A1 - Jie CHEN
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 26
IS - 12
SP - 2511
EP - 2528
%@ 2095-9184
Y1 - 2025
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2500100


Abstract: 
With the increasing number of vulnerabilities exposed on the Internet, autonomous penetration testing (pentesting) has emerged as a promising research area. reinforcement learning (RL) is a natural fit for studying this topic. However, two key challenges limit the applicability of RL-based autonomous pentesting in real-world scenarios: the training environment dilemma—training agents in simulated environments is sample-efficient while ensuring that their realism remains challenging; poor generalization ability—agents’ policies often perform poorly when transferred to unseen scenarios, with even slight changes potentially causing a significant generalization gap. To address both challenges, we propose a generalizable autonomous pentesting framework termed GAP, which aims to achieve efficient policy training in realistic environments and train generalizable agents capable of drawing inferences about other cases from one instance. GAP introduces a real-to-sim-to-real pipeline that enables end-to-end policy learning in unknown real environments while constructing realistic simulations and improves agents’ generalization ability by leveraging domain randomization and meta-RL learning. We are among the first to apply domain randomization in autonomous pentesting and propose a large language model-powered domain randomization method for synthetic environment generation. We further apply meta-RL to improve agents’ generalization ability in unseen environments by leveraging synthetic environments. Combining the two methods effectively bridges the generalization gap and improves agents’ policy adaptation performance. Simulations are conducted on various vulnerable virtual machines, with results showing that GAP can enable policy learning in various realistic environments, achieve zero-shot policy transfer in similar environments, and achieve rapid policy adaptation in dissimilar environments.

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Beck J, Vuorio R, Liu EZ, et al., 2023. A survey of meta-reinforcement learning.

[2]Bo L, Zhang TZ, Zhang HX, et al., 2024. 3D UAV path planning in unknown environment: a transfer reinforcement learning method based on low-rank adaption. Adv Eng Inform, 62:102920.

[3]Chen J, Wu DD, Xie RY, 2023. Artificial intelligence algorithms for cyberspace security applications: a technological and status review. Front Inform Technol Electron Eng, 24(8):1117-1142.

[4]Chen JY, Hu SL, Zheng HB, et al., 2023. GAIL-PT: an intelligent penetration testing framework with generative adversarial imitation learning. Comput Secur, 126:103055.

[5]Chen XY, Hu JC, Jin C, et al., 2022. Understanding domain randomization for sim-to-real transfer. Proc 10th Int Conf on Learning Representations, p.1-28.

[6]Cobbe K, Klimov O, Hesse C, et al., 2019. Quantifying generalization in reinforcement learning. https://arxiv.org/abs/1812.02341

[7]Feng S, Sun HW, Yan XT, et al., 2023. Dense reinforcement learning for safety validation of autonomous vehicles. Nature, 615:620-627.

[8]Finn C, Abbeel P, Levine S, 2017. Model-agnostic meta-learning for fast adaptation of deep networks. Proc 34th Int Conf on Machine Learning, p.1126-1135.

[9]Guo X, Chen YQ, 2024. Generative AI for synthetic data generation: methods, challenges and the future.

[10]Holm H, 2023. Lore a red team emulation tool. IEEE Trans Depend Secur Comput, 20(2):1596-1608.

[11]Horváth D, Erdös G, Istenes Z, et al., 2023. Object detection using sim2real domain randomization for robotic applications. IEEE Trans Robotics, 39(2):1225-1243.

[12]Hospedales TM, Antoniou A, Micaelli P, et al., 2022. Meta-learning in neural networks: a survey. IEEE Trans Pattern Anal Mach Intell, 44(9):5149-5169.

[13]Huang HC, Ye DH, Shen L, et al., 2023. Curriculum-based asymmetric multi-task reinforcement learning. IEEE Trans Pattern Anal Mach Intell, 45(6):7258-7269.

[14]Ilic N, Dasic D, Vucetic M, et al., 2024. Distributed web hacking by adaptive consensus-based reinforcement learning. Artif Intell, 326:104032.

[15]Jonathon S, Hanna K, 2019. NetworkAttactSimulator. https://github.com/Jjschwartz/NetworksttackSimulator [Accessed on Feb. 16, 2025].

[16]Kirk R, Zhang A, Grefenstette E, et al., 2023. A survey of zero-shot generalisation in deep reinforcement learning. J Artif Intell Res, 76:201-264.

[17]Li QY, Wang RP, Li D, et al., 2024. DynPen: automated penetration testing in dynamic network scenarios using deep reinforcement learning. IEEE Trans Inform Forens Secur, 19:8966-8981.

[18]Li ZY, Zhu HX, Lu ZR, et al., 2023. Synthetic data generation with large language models for text classification: potential and limitations. Proc Conf on Empirical Methods in Natural Language Processing, p.10443-10461.

[19]Lyle C, Rowland M, Dabney W, et al., 2022. Learning dynamics and generalization in deep reinforcement learning. Proc Int Conf on Machine Learning, p.14560-14581.

[20]Maeda R, Mimura M, 2021. Automating post-exploitation with deep reinforcement learning. Comput Secur, 100:102108.

[21]Metelli AM, 2024. Recent advancements in inverse reinforcement learning. Proc 38th AAAI Conf on Artificial Intelligence, p.22680.

[22]Microsoft Defender Research Team, 2021. CyberBattleSim. https://github.com/microsoft/cyberbattlesim [Accessed on Feb. 16, 2025].

[23]Nguyen HPT, Hasegawa K, Fukushima K, et al., 2025. PenGym: realistic training environment for reinforcement learning pentesting agents. Comput Secur, 148:104140.

[24]Parisi GI, Kemker R, Part JL, et al., 2019. Continual lifelong learning with neural networks: a review. Neur Netw, 113:54-71.

[25]Schulman J, Wolski F, Dhariwal P, et al., 2017. Proximal policy optimization algorithms.

[26]Shuster K, Poff S, Chen MY, et al., 2021. Retrieval augmentation reduces hallucination in conversation. Proc Findings of the Association for Computational Linguistics, p.3784-3803.

[27]Takaesu I, 2018. DeepExploit. https://github.com/13o-bbr-bbq/machine_learning_security/blob/master/DeepExploit [Accessed on Feb. 16, 2025].

[28]Team GLM, 2024. ChatGLM: a family of large language models from GLM-130B to GLM-4 All Tools.

[29]Tobin J, Fong R, Ray A, et al., 2017. Domain randomization for transferring deep neural networks from simulation to the real world. Proc IEEE/RSJ Int Conf on Intelligent Robots and Systems, p.23-30.

[30]Tran K, Akella A, Standen M, et al., 2021. Deep hierarchical reinforcement agents for automated penetration testing.

[31]Wang KX, Kang BY, Shao J, et al., 2020. Improving generalization in reinforcement learning with mixture regularization. Proc Annual Conf on Neural Information Processing Systems, p.7968-7978.

[32]Wang KX, Reimers N, Gurevych I, 2021. TSDAE: using Transformer-based sequential denoising auto-encoderfor unsupervised sentence embedding learning. Proc Findings of the Association for Computational Linguistics, p.671-688.

[33]Yang YZ, Chen MX, Fu HH, et al., 2023. SetTron: towards better generalisation in penetration testing with reinforcement learning. Proc IEEE Global Communications Conf, p.4662-4667.

[34]Yang YZ, Chen LD, Liu S, et al., 2025. Behaviour-diverse automatic penetration testing: a coverage-based deep reinforcement learning approach. Front Comput Sci, 19(3):193309.

[35]Ye DY, Zhu TQ, Gao K, et al., 2024. Defending against label-only attacks via meta-reinforcement learning. IEEE Trans Inform Forens Secur, 19:3295-3308.

[36]Zhao WS, Queralta JP, Westerlund T, 2020. Sim-to-real transfer in deep reinforcement learning for robotics: a survey. Proc IEEE Symp Series on Computational Intelligence, p.737-744.

[37]Zhou SC, Liu JJ, Lu YL, et al., 2024. APRIL: towards scalable and transferable autonomous penetration testing in large action space via action embedding. IEEE Trans Depend Secur Comput, 22(3):2443-2459.

[38]Zhu ZD, Lin KX, Jain AK, et al., 2023. Transfer learning in deep reinforcement learning: a survey. IEEE Trans Pattern Anal Mach Intell, 45(11):13344-13362.

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - 2026 Journal of Zhejiang University-SCIENCE