Publishing Service

Polishing & Checking

Frontiers of Information Technology & Electronic Engineering

ISSN 2095-9184 (print), ISSN 2095-9230 (online)

A software defect prediction method with metric compensation based on feature selection and transfer learning

Abstract: Cross-project software defect prediction solves the problem of insufficient training data for traditional defect prediction, and overcomes the challenge of applying models learned from multiple different source projects to target project. At the same time, two new problems emerge: (1) too many irrelevant and redundant features in the model training process will affect the training efficiency and thus decrease the prediction accuracy of the model; (2) the distribution of metric values will vary greatly from project to project due to the development environment and other factors, resulting in lower prediction accuracy when the model achieves cross-project prediction. In the proposed method, the Pearson feature selection method is introduced to address data redundancy, and the metric compensation based transfer learning technique is used to address the problem of large differences in data distribution between the source project and target project. In this paper, we propose a software defect prediction method with metric compensation based on feature selection and transfer learning. The experimental results show that the model constructed with this method achieves better results on area under the receiver operating characteristic curve (AUC) value and F1-measure metric.

Key words: Defect prediction; Feature selection; Transfer learning; Metric compensation

Chinese Summary  <22> 一种基于特征选择与迁移学习的度量补偿软件缺陷预测方法

陈锦富1,2,王小丽1,2,蔡赛华1,2,徐家平1,陈静怡1,陈海波1
1江苏大学计算机科学与通信工程学院,中国镇江市,212013
2江苏大学工业网络空间安全技术江苏省重点实验室,中国镇江市,212013
摘要:跨项目软件缺陷预测解决了传统缺陷预测中训练数据不足的问题,克服了将多个不同源项目中学习的模型应用于目标项目的挑战。与此同时,出现两个新问题:(1)模型训练过程中过多无关和冗余特征影响训练效率,降低了模型预测精度;(2)由于开发环境等因素,度量值的分布因项目而异,当模型用于跨项目预测时,预测精度较低。本文引入皮尔逊特征选择方法解决数据冗余问题,采用基于迁移学习的度量补偿技术解决源项目和目标项目之间数据分布差异较大的问题。提出一种基于特征选择和迁移学习的度量补偿软件缺陷预测方法。实验结果表明,用该方法构建的模型在AUC(接收器工作特性曲线下面积)值和F1度量指标上取得较好结果。

关键词组:缺陷预测;特征选择;迁移学习;度量补偿


Share this article to: More

Go to Contents

References:

<Show All>

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





DOI:

10.1631/FITEE.2100468

CLC number:

TP311.5

Download Full Text:

Click Here

Downloaded:

4180

Download summary:

<Click Here> 

Downloaded:

313

Clicked:

2443

Cited:

0

On-line Access:

2022-05-19

Received:

2021-09-30

Revision Accepted:

2022-05-19

Crosschecked:

2022-02-05

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952276; Fax: +86-571-87952331; E-mail: jzus@zju.edu.cn
Copyright © 2000~ Journal of Zhejiang University-SCIENCE