JZUS - Journal of Zhejiang University SCIENCE

ENGINEERING Information Technology & Electronic Engineering

Accepted manuscript available online (unedited version)

Active inference of protocol state machines from incomplete message domains

Author(s): Maohua GUO, Yuefei ZHU, Jinlong FEI
Affiliation(s): Key Laboratory of Cyberspace Security, Ministry of Education, Zhengzhou 450001, China
Corresponding email(s): czxing.2019@outlook.com, yfzhu17@sina.com, feijinlong_2021@163.com
Key Words: Protocol reverse engineering (PRE); Protocol state machine; Active inference; Incomplete message domains; Input space

Share this article to： More <<< Previous Paper \|Next Paper >>>

Maohua GUO, Yuefei ZHU, Jinlong FEI. Active inference of protocol state machines from incomplete message domains[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.2400487

@article{title="Active inference of protocol state machines from incomplete message domains",
author="Maohua GUO, Yuefei ZHU, Jinlong FEI",
journal="Frontiers of Information Technology & Electronic Engineering",
year="in press",
publisher="Zhejiang University Press & Springer",
doi="https://doi.org/10.1631/FITEE.2400487"
}

%0 Journal Article
%T Active inference of protocol state machines from incomplete message domains
%A Maohua GUO
%A Yuefei ZHU
%A Jinlong FEI
%J Frontiers of Information Technology & Electronic Engineering
%P 2529-2549
%@ 2095-9184
%D in press
%I Zhejiang University Press & Springer
doi="https://doi.org/10.1631/FITEE.2400487"

TY - JOUR
T1 - Active inference of protocol state machines from incomplete message domains
A1 - Maohua GUO
A1 - Yuefei ZHU
A1 - Jinlong FEI
J0 - Frontiers of Information Technology & Electronic Engineering
SP - 2529
EP - 2549
%@ 2095-9184
Y1 - in press
PB - Zhejiang University Press & Springer
ER -
doi="https://doi.org/10.1631/FITEE.2400487"

Abstract
Chinese Summary
Academic Network
Reviewer Comment

Abstract: Inferring protocol state machines from observable information presents a significant challenge in protocol reverse engineering (PRE), especially when passively collected traffic suffers from message loss, resulting in an incomplete protocol state space. This paper introduces an innovative method for actively inferring protocol state machines using the minimally adequate teacher (MAT) framework. By incorporating session completion and deterministic mutation techniques, this method broadens the range of protocol messages, thereby constructing a more comprehensive input space for the protocol state machine from an incomplete message domain. Additionally, the efficiency of active inference is improved through several optimizations for the L_M⁺ algorithm, including traffic deduplication, the construction of an expanded prefix tree acceptor (EPTA), query optimization based on responses, and random counterexample generation. Experiments on the real-time streaming protocol (RTSP) and simple mail transfer protocol (SMTP), which use Live555 and Exim implementations across multiple versions, demonstrate that this method yields more comprehensive protocol state machines with enhanced execution efficiency. Compared to the L_M⁺ algorithm implemented by AALpy, Act_Infer achieves an average reduction of approximately 40.7% in execution time and significantly reduces the number of connections and interactions by approximately 28.6% and 46.6%, respectively.

基于不完备消息域的协议状态机主动推断

郭茂华，祝跃飞，费金龙
网络空间安全教育部重点实验室，中国郑州市，450001
摘要：通过可观察到的信息实现协议状态机的推断是协议逆向工程（PRE）中的一个重大挑战，特别是当被动收集的流量因报文缺失而导致协议状态空间不完整时。本文基于最少充足教师（MAT）框架提出了一种新的协议状态机主动推断方法。结合会话补全和确定性变异技术，该方法拓展了协议报文类型，从而基于不完备消息域构建了更全面的协议状态机输入空间。此外，通过对算法的优化，包括流量去重、扩展前缀树接受器（EPTA）的构建、基于响应的查询优化、基于状态转移的随机反例生成等，主动推断的效率得到提升。基于Live555和Exim多个版本的实现，针对实时流协议（RTSP）和简单邮件传输协议（SMTP）的实验表明，该方法能够以更高的执行效率推断出更完善的协议状态机。相较于AALpy实现的算法，Act_Infer的执行时间平均降低了约40.7%，连接次数和交互次数分别降低了约28.6%和46.6%。

关键词组：协议逆向工程（PRE）；协议状态机；主动推断；不完备消息域；输入空间

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Abdulganiyu OH, Ait Tchakoucht T, Saheed YK, 2023. A systematic literature review for network intrusion detection system (IDS). Int J Inform Secur, 22(5):1125-1162.

[2]Angluin D, 1987. Learning regular sets from queries and counterexamples. Inform Comput, 75(2):87-106.

[3]Antonakakis M, April T, Bailey M, et al., 2017. Understanding the Mirai botnet. Proc 26^th USENIX Security Symp, p.1093-1110.

[4]Antunes J, Neves N, Verissimo P, 2011. Reverse engineering of protocols from network traces. Proc 18^th Working Conf on Reverse Engineering, p.169-178.

[5]Bermudez I, Tongaonkar A, Iliofotou M, et al., 2016. Towards automatic protocol field inference. Comput Commun, 84:40-51.

[6]Bossert G, Guihery F, Hiet G, 2014. Towards automated protocol reverse engineering using semantic information. Proc 9^th ACM Symp on Information Computer and Communications Security, p.51-62.

[7]Bujlow T, Carela-Español V, Barlet-Ros P, 2015. Independent comparison of popular DPI tools for traffic classification. Comput Netw, 76:75-89.

[8]Chandler J, 2023. Poster: a Monte Carlo ensemble approach to automatically identifying keywords in binary message formats. Proc Network and Distributed System Security Symp.

[9]Chandler J, Wick A, Fisher K, 2023. BinaryInferno: a semantic-driven approach to field inference for binary message formats. Proc Network and Distributed System Security Symp.

[10]Cho CY, Babi ĆD, Shin ECR, et al., 2010. Inference and analysis of formal models of botnet command and control protocols. Proc 17^th ACM Conf on Computer and Communications Security, p.426-439.

[11]de Carli L, Torres R, Modelo-Howard G, et al., 2017. Botnet protocol inference in the presence of encrypted traffic. Proc IEEE Conf on Computer Communications, p.1-9.

[12]Fang DL, Song ZW, Guan L, et al., 2021. ICS3Fuzzer: a framework for discovering protocol implementation bugs in ICS supervisory software by fuzzing. Proc 37^th Annual Computer Security Applications Conf, p.849-860.

[13]Fujiwara S, Bochmann GV, Khendek F, et al., 1991. Test selection based on finite state models. IEEE Trans Softw Eng, 17(6):591-603.

[14]Gold, EM, 1967. Language identification in the limit. Inform Contr, 10(5):447-474.

[15]Huang YY, Shu H, Kang F, et al., 2022. Protocol reverse-engineering methods and tools: a survey. Comput Commun, 182:238-254.

[16]Kleber S, Kopp H, Kargl F, 2018. NEMESYS: network message syntax reverse engineering by analysis of the intrinsic structure of individual messages. Proc 12^th USENIX Workshop on Offensive Technologies, Article 8.

[17]Kleber S, Kargl F, State M, et al., 2022. Network message field type clustering for reverse engineering of unknown binary protocols. Proc 52^nd Annual IEEE/IFIP Int Conf on Dependable Systems and Networks Workshops, p.80-87.

[18]Le SQ, Lai YX, Wang YP, et al., 2024. An adaptive classification and updating method for unknown network traffic in open environments. Comput Netw, 238:110114.

[19]Lee C, Bae J, Lee H, 2018. PRETT: protocol reverse engineering using binary tokens and network traces. Proc 33^rd IFIP Int Conf on ICT Systems Security and Privacy Protection, p.141-155.

[20]Leita C, Mermoud K, Dacier M, 2005. ScriptGen: an automated script generation tool for Honeyd. Proc 21^st Annual Computer Security Applications Conf, p.203-214.

[21]Li JC, Cheng G, Yang GQ, 2023. Private protocol reverse engineering based on network traffic: a survey. J Comput Res Dev, 60(1):167-190 (in Chinese).

[22]Lin YD, Lai YK, Bui QT, et al., 2020. ReFSM: reverse engineering from protocol packet traces to test generation by extended finite state machines. J Netw Comput Appl, 171:102819.

[23]Ma RK, Zheng H, Wang JY, et al., 2022. Automatic protocol reverse engineering for industrial control systems with dynamic taint analysis. Front Inform Technol Electron Eng, 23(3):351-360.

[24]Muškardin E, Aichernig BK, Pill I, et al., 2022. AALpy: an active automata learning library. Innov Syst Softw Eng, 18(3):417-426.

[25]Natella R, 2022. STATEAFL: greybox fuzzing for stateful network servers. Empir Software Eng, 27(7):191.

[26]Pan Y, Lin W, Zhu YF, 2023. Progressive active inference method of protocol state machine. Chin J Netw Inform Secur, 9(2):81-93 (in Chinese).

[27]Pham VT, Böhme M, Roychoudhury A, 2020. AFLNET: a greybox fuzzer for network protocols. Proc IEEE 13^th Int Conf on Software Testing Validation and Verification, p.460-465.

[28]Saied M, Guirguis S, Madbouly M, 2024. Review of artificial intelligence for enhancing intrusion detection in the internet of things. Eng Appl Artif Intell, 127:107231.

[29]Shahbaz MM, 2008. Reverse Engineering Enhanced State Models of Black Box Software Components to Support Integration Testing. PhD Dissemination, Grenoble Universities, Auvergne-Rhône-Alpes, France.

[30]Shevertalov M, Mancoridis S, 2007. A reverse engineering tool for extracting protocols of networked applications. Proc 14^th Working Conf on Reverse Engineering, p.229-238.

[31]Sun FH, Wang S, Zhang CR, et al., 2019. Unsupervised field segmentation of unknown protocol messages. Comput Commun, 146:121-130.

[32]Sun FH, Wang S, Zhang HL, 2022. A progressive learning method on unknown protocol behaviors. J Netw Comput Appl, 197:103249.

[33]Székely G, Ládi G, Holczer T, et al., 2021. Protocol state machine reverse engineering with a teaching-learning approach. Acta Cybern, 25(2):517-535.

[34]Tang T, Lai YX, Wang YP, 2023. Relational reasoning-based approach for network protocol reverse engineering. Comput Netw, 230:109797.

[35]Wang C, Wu LF, Hong Z, et al., 2015. Domain-specific algorithm of protocol state machine active inference. Comput Sci, 42(12):233-239 (in Chinese). https://www.jsjkx.com/CN/Y2015/V42/I12/233

[36]Wang XW, Lv KZ, Li B, 2020. IPART: an automatic protocol reverse engineering tool based on global voting expert for industrial protocols. Int J Parall Emerg Distrib Syst, 35(3):376-395.

[37]Wang YP, Zhang ZB, Yao DF, et al., 2011. Inferring protocol state machine from network traces: a probabilistic approach. Proc 9^th Int Conf on Applied Cryptography and Network Security, p.1-18.

[38]Wang YP, Yun XC, Shafiq MZ, et al., 2012. A semantics aware approach to automated reverse engineering unknown protocols. Proc 20^th IEEE Int Conf on Network Protocols, p.1-10.

[39]Wang YP, Yun XC, Zhang YZ, et al., 2022. A multi-scale feature attention approach to network traffic classification and its model explanation. IEEE Trans Netw Serv Manage, 19(2):875-889.

[40]Whalen S, Bishop M, Crutchfield JP, 2010. Hidden Markov models for automated protocol learning. Proc 6^th Int Conf on Security and Privacy in Communication Systems, p.415-428.

[41]Ye YP, Zhang Z, Wang F, et al., 2021. NetPlier: probabilistic network protocol reverse engineering from message traces. Proc Network and Distributed System Security Symp.

[42]Yu ZH, Liu ZQ, Cong XY, et al., 2024. Fuzzing: progress, challenges, and perspectives. Comput Mater Continua, 78(1):1-29.

Open peer comments: Debate/Discuss/Question/Opinion

<1>

- Go to

基于不完备消息域的协议状态机主动推断

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference