CLC number: TP393.08
On-line Access: 2021-09-10
Received: 2020-06-13
Revision Accepted: 2021-01-31
Crosschecked: 2021-08-24
Cited: 0
Clicked: 6204
Citations: Bibtex RefMan EndNote GB/T7714
Chen Gao, Xuan Zhang, Mengting Han, Hui Liu. A review on cyber security named entity recognition[J]. Frontiers of Information Technology & Electronic Engineering, 2021, 22(9): 1153-1168.
@article{title="A review on cyber security named entity recognition",
author="Chen Gao, Xuan Zhang, Mengting Han, Hui Liu",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="22",
number="9",
pages="1153-1168",
year="2021",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2000286"
}
%0 Journal Article
%T A review on cyber security named entity recognition
%A Chen Gao
%A Xuan Zhang
%A Mengting Han
%A Hui Liu
%J Frontiers of Information Technology & Electronic Engineering
%V 22
%N 9
%P 1153-1168
%@ 2095-9184
%D 2021
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2000286
TY - JOUR
T1 - A review on cyber security named entity recognition
A1 - Chen Gao
A1 - Xuan Zhang
A1 - Mengting Han
A1 - Hui Liu
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 22
IS - 9
SP - 1153
EP - 1168
%@ 2095-9184
Y1 - 2021
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2000286
Abstract: With the rapid development of Internet technology and the advent of the era of big data, more and more cyber security texts are provided on the Internet. These texts include not only security concepts, incidents, tools, guidelines, and policies, but also risk management approaches, best practices, assurances, technologies, and more. Through the integration of large-scale, heterogeneous, unstructured cyber security information, the identification and classification of cyber security entities can help handle cyber security issues. Due to the complexity and diversity of texts in the cyber security domain, it is difficult to identify security entities in the cyber security domain using the traditional named entity recognition (NER) methods. This paper describes various approaches and techniques for NER in this domain, including the rule-based approach, dictionary-based approach, and machine learning based approach, and discusses the problems faced by NER research in this domain, such as conjunction and disjunction, non-standardized naming convention, abbreviation, and massive nesting. Three future directions of NER in cyber security are proposed: (1) application of unsupervised or semi-supervised technology; (2) development of a more comprehensive cyber security ontology; (3) development of a more comprehensive deep learning model.
[1]Bridges RA, Jones CL, Iannacone MD, et al., 2013. Automatic labeling for entity extraction in cyber security. https://arxiv.org/abs/1308.4941
[2]Caruana R, 1997. Multitask learning. Mach Learn, 28(1):41-75.
[3]Devlin J, Chang MW, Lee K, 2018. BERT: pre-training of deep bidirectional transformers for language understanding. https://arxiv.org/abs/1810.04805
[4]Dionísio N, Alves F, Ferreira PM, et al., 2019. Cyberthreat detection from Twitter using deep neural networks. Int Joint Conf on Neural Networks, p.1-8.
[5]Eddy SR, 1996. Hidden Markov models. Curr Opin Struct Biol, 6(3):361-365.
[6]Gasmi H, Bouras A, Laval J, 2018. LSTM recurrent neural networks for cyber security named entity recognition. Proc 13th Int Conf on Software Engineering Advances, p.12-17.
[7]Georgescu TM, Iancu B, Zurini M, 2019. Named-entity-recognition-based automated system for diagnosing cybersecurity situations in IoT networks. Sensors, 19(15):3380.
[8]Gu XM, Liu JY, Cheng PS, et al., 2020. Malware name recognition in tweets based on enhanced BiLSTM-CRF model. Comput Sci, 47(2):245-250 (in Chinese).
[9]Hearst MA, Dumais ST, Osuna E, et al., 1998. Support vector machines. IEEE Intell Syst Their Appl, 13(4):18-28.
[10]Joshi A, Lal R, Finin T, et al., 2013. Extracting cybersecurity related linked data from text. Proc 7th Int Conf on Semantic Computing, p.252-259.
[11]Kaelbling LP, Littman ML, Moore AW, 1996. Reinforcement learning: a survey. J Artif Intell Res, 4:237-285.
[12]Kim G, Lee C, Jo J, et al., 2020. Automatic extraction of named entities of cyber threats using a deep Bi-LSTM-CRF network. Int J Mach Learn Cyber, 11(10):2341-2355.
[13]Lafferty JD, McCallum A, Pereira FCN, 2001. Conditional random fields: probabilistic models for segmenting and labeling sequence data. Proc 18th Int Conf on Machine Learning, p.282-289.
[14]Lal R, 2013. Information Extraction of Security Related Entities and Concepts from Unstructured Text. MS Thesis, University of Maryland, Baltimore County, Baltimore, USA.
[15]Lample G, Ballesteros M, Subramanian S, et al., 2016. Neural architectures for named entity recognition. https://arxiv.org/abs/1603.01360
[16]LeCun Y, Bengio Y, Hinton G, 2015. Deep learning. Nature, 521(7553):436-444.
[17]Lee JY, Dernoncourt F, Szolovits P, 2018. Transfer learning for named-entity recognition with neural networks. Proc 11th Int Conf on Language Resources and Evaluation, p.4471-4473.
[18]Li T, Guo YB, Ju AK, 2019. A self-attention-based approach for named entity recognition in cybersecurity. Proc 15th Int Conf on Computational Intelligence and Security, p.147-150.
[19]Liu WG, 2020. Network security entity recognition methods based on the deep neural network. In: Huang CC, Chan YW, Yen N (Eds.), Data Processing Techniques and Applications for Cyber-Physical Systems. Springer, Singapore, p.1687-1692.
[20]Long Z, Tan LZ, Zhou SP, et al., 2019. Collecting indicators of compromise from unstructured text of cybersecurity articles using neural-based sequence labelling. Int Joint Conf on Neural Networks, p.1-8.
[21]Lowd D, Meek C, 2005. Adversarial learning. Proc 11th ACM SIGKDD Int Conf on Knowledge Discovery in Data Mining, p.641-647.
[22]Ma PC, Jiang B, Lu ZG, et al., 2021. Cybersecurity named entity recognition using bidirectional long short-term memory with conditional random fields. Tsinghua Sci Technol, 26(3):259-265.
[23]Marrero M, Urbano J, Sánchez-Cuadrado S, et al., 2013. Named entity recognition: fallacies, challenges and opportunities. Comput Stand Interf, 35(5):482-489.
[24]Mazharov I, Dobrov BV, 2018. Named entity recognition for information security domain. Proc 20th Int Conf on Data Analytics and Management in Data Intensive Domains, p.200-207.
[25]McNeil N, Bridges RA, Iannacone MD, et al., 2013. PACE: pattern accurate computationally efficient bootstrapping for timely discovery of cyber-security concepts. Proc 12th Int Conf on Machine Learning and Applications, p.60-65.
[26]Mendes PN, Jakob M, García-Silva A, et al., 2011. DBpedia spotlight: shedding light on the web of documents. Proc 7th Int Conf on Semantic Systems, p.1-8.
[27]Mulwad V, Li WJ, Joshi A, et al., 2011. Extracting information about security vulnerabilities from web text. IEEE/WIC/ ACM Int Conf on Web Intelligence and Intelligent Agent Technology, p.257-260.
[28]Nadeau D, Sekine S, 2007. A survey of named entity recognition and classification. Lingv Investig, 30(1):3-26.
[29]Peters ME, Ammar W, Bhagavatula C, et al., 2017. Semi-supervised sequence tagging with bidirectional language models. https://arxiv.org/abs/1705.00108
[30]Qin Y, Shen GW, Zhao WB, et al., 2019. A network security entity recognition method based on feature template and CNN-BiLSTM-CRF. Front Inform Technol Electron Eng, 20(6):872-884.
[31]Riloff E, 1993. Automatically constructing a dictionary for information extraction tasks. Proc 11th National Conf on Artificial Intelligence, p.811-816.
[32]Roy A, Park Y, Pan SH, 2017. Learning domain-specific word embeddings from sparse cybersecurity texts. https://arxiv.org/abs/1709.07470
[33]Ruder S, 2016. An overview of gradient descent optimization algorithms. https://arxiv.org/abs/1609.04747
[34]Shang HJ, Jiang R, Li AP, et al., 2017. A framework to construct knowledge base for cyber security. Proc IEEE 2nd Int Conf on Data Science in Cyberspace, p.242-248.
[35]Shen YY, Yun H, Lipton ZC, et al., 2017. Deep active learning for named entity recognition. Proc 2nd Workshop on Representation Learning for NLP, p.252-256.
[36]Simran K, Sriram S, Vinayakumar R, et al., 2020. Deep learning approach for intelligent named entity recognition of cyber security. https://arxiv.org/abs/2004.00502
[37]Syed Z, 2010. Wikitology: a Novel Hybrid Knowledge Base Derived from Wikipedia. PhD Thesis, University of Maryland, Baltimore County, Baltimore, USA.
[38]Syed Z, Padia A, Mathews ML, et al., 2016. UCO: a unified cybersecurity ontology. AAAI Workshop on Artificial Intelligence for Cyber Security, p.14-21.
[39]Tikhomirov M, Loukachevitch N, Sirotina A, et al., 2020. Using BERT and augmentation in named entity recognition for cybersecurity domain. Proc 25th Int Conf on Applications of Natural Language to Information Systems, p.16-24.
[40]Vaswani A, Shazeer N, Parmar N, et al., 2017. Attention is all you need. Proc 31st Int Conf on Neural Information Processing Systems, p.6000-6010.
[41]Wang XR, Xiong ZH, Du XY, et al., 2020. NER in threat intelligence domain with TSFL. Proc 9th Int Conf on Natural Language Processing and Chinese Computing, p.157-169.
[42]Weerawardhana S, Mukherjee S, Ray I, et al., 2014. Automated extraction of vulnerability information for home computer security. Proc 7th Int Symp on Foundations and Practice of Security, p.356-366.
[43]Wu H, Li XY, Gao YL, 2020. An effective approach of named entity recognition for cyber threat intelligence. Proc IEEE 4th Information Technology, Networking, Electronic and Automation Control Conf, p.1370-1374.
[44]Xiao ZF, 2018. Towards a two-phase unsupervised system for cybersecurity concepts extraction. Proc 13th Int Conf on Natural Computation, Fuzzy Systems and Knowledge Discovery, p.2161-2168.
[45]Zhang H, Guo YB, Li T, 2019. Multifeature named entity recognition in information security based on adversarial learning. Secur Commun Netw, 2019:6417407.
[46]Zhou SP, Long Z, Tan LZ, et al., 2018. Automatic identification of indicators of compromise using neural-based sequence labelling. https://arxiv.org/abs/1810.10156
Open peer comments: Debate/Discuss/Question/Opinion
<1>