Publishing Service

Polishing & Checking

Frontiers of Information Technology & Electronic Engineering

ISSN 2095-9184 (print), ISSN 2095-9230 (online)

Automatic malware classification and new malware detection using machine learning

Abstract: The explosive growth of malware variants poses a major threat to information security. Traditional anti-virus systems based on signatures fail to classify unknown malware into their corresponding families and to detect new kinds of malware programs. Therefore, we propose a machine learning based malware analysis system, which is composed of three modules: data processing, decision making, and new malware detection. The data processing module deals with gray-scale images, Opcode n-gram, and import functions, which are employed to extract the features of the malware. The decision-making module uses the features to classify the malware and to identify suspicious malware. Finally, the detection module uses the shared nearest neighbor (SNN) clustering algorithm to discover new malware families. Our approach is evaluated on more than 20 000 malware instances, which were collected by Kingsoft, ESET NOD32, and Anubis. The results show that our system can effectively classify the unknown malware with a best accuracy of 98.9%, and successfully detects 86.7% of the new malware.

Key words: Malware classification; Machine learning; n-gram; Gray-scale image; Feature extraction; Malware detection

Chinese Summary  <25> 基于机器学习的自动化恶意代码分类与新恶意代码检测技术

概要:恶意软件的爆炸式增长对信息安全构成重大威胁。基于签名机制的传统反病毒系统无法将未知的恶意软件分类到相应的恶意家族和检测新的恶意软件。因此,我们提出一种基于机器学习的恶意软件分析系统,由数据处理系统,决策系统和新的恶意软件检测系统三个子系统组成。数据处理系统包含灰度图像的纹理特征,Opcode特征和API特征等三种特征提取方法。决策系统被用来分类恶意软件和证实可疑的恶意软件。最后,检测系统使用共享近邻聚类算法(shared nearest neighbor, SNN)来发现新的恶意软件。我们在Kingsoft,,ESET NOD32和Anubis收集的二万多恶意样本集上对所提出的方法进行了评估。结果表明,我们的系统可以有效地分类未知恶意软件,准确率可达98.9%。同时新恶意软件的成功检测率为86.7%。

关键词组:恶意代码分类;机器学习;n-gram;灰度图;特征提取;恶意代码检测


Share this article to: More

Go to Contents

References:

<Show All>

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





DOI:

10.1631/FITEE.1601325

CLC number:

TP309.5

Download Full Text:

Click Here

Downloaded:

3170

Download summary:

<Click Here> 

Downloaded:

1898

Clicked:

7345

Cited:

0

On-line Access:

2017-10-25

Received:

2016-06-12

Revision Accepted:

2016-09-14

Crosschecked:

2017-09-15

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952276; Fax: +86-571-87952331; E-mail: jzus@zju.edu.cn
Copyright © 2000~ Journal of Zhejiang University-SCIENCE