Publishing Service

Polishing & Checking

Frontiers of Information Technology & Electronic Engineering

ISSN 2095-9184 (print), ISSN 2095-9230 (online)

A feature selection approach based on a similarity measure for software defect prediction

Abstract: Software defect prediction is aimed to find potential defects based on historical data and software features. Software features can reflect the characteristics of software modules. However, some of these features may be more relevant to the class (defective or non-defective), but others may be redundant or irrelevant. To fully measure the correlation between different features and the class, we present a feature selection approach based on a similarity measure (SM) for software defect prediction. First, the feature weights are updated according to the similarity of samples in different classes. Second, a feature ranking list is generated by sorting the feature weights in descending order, and all feature subsets are selected from the feature ranking list in sequence. Finally, all feature subsets are evaluated on a k-nearest neighbor (KNN) model and measured by an area under curve (AUC) metric for classification performance. The experiments are conducted on 11 National Aeronautics and Space Administration (NASA) datasets, and the results show that our approach performs better than or is comparable to the compared feature selection approaches in terms of classification performance.

Key words: Software defect prediction, Feature selection, Similarity measure, Feature weights, Feature ranking list

Chinese Summary  <19> 一种面向软件缺陷预测的相似性度量特征选择方法

概要:软件缺陷预测旨在通过历史数据和能反映软件模块特性的软件特征来发现潜在缺陷。然而,有的特征可能与类别(有缺陷或无缺陷)的相关性较高,有的特征可能是冗余的或无关的。针对软件缺陷预测中不同特征与类别的相关性差异,本文提出一种基于相似性度量(similarity measure, SM)的特征选择方法。首先,根据不同类样本间的相似性来更新特征权重;然后,按照特征权重值降序排列生成特征排序列表,并依次选取特征排序列表中的所有特征子集;最后,在KNN(k-nearest neighbor)模型上验证所有特征子集的分类性能,并采用AUC(areaunder curve)指标进行度量。在11个美国航空航天局(NASA)数据集上进行实验验证,结果表明,与其它四种特征选择方法相比,本文方法具有与之相当甚至更高的分类性能。

关键词组:软件缺陷预测;特征选择;相似性度量;特征权重;特征排序列表


Share this article to: More

Go to Contents

References:

<Show All>

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





DOI:

10.1631/FITEE.1601322

CLC number:

TP311

Download Full Text:

Click Here

Downloaded:

2755

Download summary:

<Click Here> 

Downloaded:

1519

Clicked:

6958

Cited:

0

On-line Access:

2018-01-11

Received:

2016-06-11

Revision Accepted:

2016-09-14

Crosschecked:

2017-11-26

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952276; Fax: +86-571-87952331; E-mail: jzus@zju.edu.cn
Copyright © 2000~ Journal of Zhejiang University-SCIENCE