Publishing Service

Polishing & Checking

Frontiers of Information Technology & Electronic Engineering

ISSN 2095-9184 (print), ISSN 2095-9230 (online)

Mind the Gap: towards generalizable autonomous penetration testing via domain randomization and meta-reinforcement learning

Abstract: With the increasing number of vulnerabilities exposed on the Internet, autonomous penetration testing (pentesting) has emerged as a promising research area. Reinforcement learning (RL) is a natural fit for studying this topic. However, two key challenges limit the applicability of RL-based autonomous pentesting in real-world scenarios: the training environment dilemma—training agents in simulated environments is sample-efficient while ensuring that their realism remains challenging; poor generalization ability—agents’ policies often perform poorly when transferred to unseen scenarios, with even slight changes potentially causing a significant generalization gap. To address both challenges, we propose a generalizable autonomous pentesting framework termed GAP, which aims to achieve efficient policy training in realistic environments and train generalizable agents capable of drawing inferences about other cases from one instance. GAP introduces a real-to-sim-to-real pipeline that enables end-to-end policy learning in unknown real environments while constructing realistic simulations and improves agents’ generalization ability by leveraging domain randomization and meta-RL learning. We are among the first to apply domain randomization in autonomous pentesting and propose a large language model-powered domain randomization method for synthetic environment generation. We further apply meta-RL to improve agents’ generalization ability in unseen environments by leveraging synthetic environments. Combining the two methods effectively bridges the generalization gap and improves agents’ policy adaptation performance. Simulations are conducted on various vulnerable virtual machines, with results showing that GAP can enable policy learning in various realistic environments, achieve zero-shot policy transfer in similar environments, and achieve rapid policy adaptation in dissimilar environments.

Key words: Cybersecurity; Penetration testing; Reinforcement learning; Domain randomization; Meta-reinforcement learning; Large language model

Chinese Summary  <0> 注意差距:通过域随机化和元强化学习实现可泛化的自动化渗透测试

周仕承1,2,刘京菊1,2,3,陆余良1,2,杨家海3,张悦4,陈杰1
1国防科技大学电子对抗学院,中国合肥市,230037
2网络空间安全态势感知与评估安徽省重点实验室,中国合肥市,230037
3清华大学网络科学与网络空间研究院,中国北京市,100084
4国防科技大学计算机学院,中国长沙市,410073
摘要:随着暴露在互联网上的漏洞数量不断增加,自动化渗透测试已成为一个极具前景的研究领域,而强化学习的特性使其可以很好地适用于该领域的研究。然而,基于强化学习的自动化渗透测试在实际场景中应用时通常面临两大关键挑战:一是训练环境困境,即在模拟环境中训练智能体虽能保证较高采样效率和学习效率,却难以确保环境的真实性;二是泛化能力不足,即当将智能体的策略迁移至未知场景时往往表现不佳,即便环境仅发生微小变化,也可能导致显著的泛化差距。为解决上述两大挑战,本文提出一种可泛化的自动化渗透测试框架GAP,其核心目标是使智能体可在真实环境中高效训练,并提高智能体的策略泛化能力使其具备举一反三的能力。GAP引入一种"真实-模拟-真实"的工作流,既能使智能体在未知真实环境中实现端到端的策略学习,同时可构建逼真的模拟环境,还通过域随机化与元强化学相结合的方式,提升了智能体的泛化能力。本文首次将域随机化应用于自动化渗透测试领域,提出一种基于大型语言模型的域随机化方法,用于生成合成环境。在生成的合成环境基础上,通过元强化学习提升智能体在未知场景中的泛化能力。这两种方法的结合有效弥合了泛化差距,显著提高了智能体的策略适应能力。本文在基于虚拟化技术构建的漏洞靶机上进行了实验,结果表明GAP框架可使智能体在多种真实环境中实现策略学习,在相似环境中实现零样本策略迁移,并在不相似环境中实现快速的策略适应。

关键词组:网络安全;渗透测试;强化学习;域随机化;元强化学习;大语言模型


Share this article to: More

Go to Contents

References:

<Show All>

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





DOI:

10.1631/FITEE.2500100

CLC number:

TP393

Download Full Text:

Click Here

Downloaded:

293

Download summary:

<Click Here> 

Downloaded:

169

Clicked:

305

Cited:

0

On-line Access:

2026-01-09

Received:

2025-02-16

Revision Accepted:

2025-10-17

Crosschecked:

2026-01-11

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952276; Fax: +86-571-87952331; E-mail: jzus@zju.edu.cn
Copyright © 2000~ Journal of Zhejiang University-SCIENCE