JZUS - Journal of Zhejiang University SCIENCE

Frontiers of Information Technology & Electronic Engineering

ISSN 2095-9184 (print), ISSN 2095-9230 (online)

2022 Vol.23 No.5 P.715-731

A software defect prediction method with metric compensation based on feature selection and transfer learning

Jinfu CHEN, Xiaoli WANG, Saihua CAI, Jiaping XU, Jingyi CHEN, Haibo CHEN

School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang 212013, China; Jiangsu Key Laboratory of Security Technology for Industrial Cyberspace, Jiangsu University, Zhenjiang 212013, China

caisaih@ujs.edu.cn

Abstract: Cross-project software defect prediction solves the problem of insufficient training data for traditional defect prediction, and overcomes the challenge of applying models learned from multiple different source projects to target project. At the same time, two new problems emerge: (1) too many irrelevant and redundant features in the model training process will affect the training efficiency and thus decrease the prediction accuracy of the model; (2) the distribution of metric values will vary greatly from project to project due to the development environment and other factors, resulting in lower prediction accuracy when the model achieves cross-project prediction. In the proposed method, the Pearson feature selection method is introduced to address data redundancy, and the metric compensation based transfer learning technique is used to address the problem of large differences in data distribution between the source project and target project. In this paper, we propose a software defect prediction method with metric compensation based on feature selection and transfer learning. The experimental results show that the model constructed with this method achieves better results on area under the receiver operating characteristic curve (AUC) value and F1-measure metric.

Key words: Defect prediction; Feature selection; Transfer learning; Metric compensation

Chinese Summary <41> 一种基于特征选择与迁移学习的度量补偿软件缺陷预测方法

陈锦富^1,2，王小丽^1,2，蔡赛华^1,2，徐家平¹，陈静怡¹，陈海波¹
¹江苏大学计算机科学与通信工程学院，中国镇江市，212013
²江苏大学工业网络空间安全技术江苏省重点实验室，中国镇江市，212013
摘要：跨项目软件缺陷预测解决了传统缺陷预测中训练数据不足的问题，克服了将多个不同源项目中学习的模型应用于目标项目的挑战。与此同时，出现两个新问题：（1）模型训练过程中过多无关和冗余特征影响训练效率，降低了模型预测精度；（2）由于开发环境等因素，度量值的分布因项目而异，当模型用于跨项目预测时，预测精度较低。本文引入皮尔逊特征选择方法解决数据冗余问题，采用基于迁移学习的度量补偿技术解决源项目和目标项目之间数据分布差异较大的问题。提出一种基于特征选择和迁移学习的度量补偿软件缺陷预测方法。实验结果表明，用该方法构建的模型在AUC（接收器工作特性曲线下面积）值和F1度量指标上取得较好结果。

关键词组：缺陷预测；特征选择；迁移学习；度量补偿

Share this article to： More

Go to Contents

References:

Open peer comments: Debate/Discuss/Question/Opinion

<1>

DOI:

10.1631/FITEE.2100468

CLC number:

TP311.5

Download Full Text:

Click Here

Downloaded:

6651

Download summary:

Downloaded:

688

Clicked:

4121

Cited:

On-line Access:

2024-08-27

Received:

2023-10-17

Revision Accepted:

2024-05-08

Crosschecked:

2022-02-05

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952276; Fax: +86-571-87952331; E-mail: jzus@zju.edu.cn
Copyright © 2000~ Journal of Zhejiang University-SCIENCE

CONTENTS

INSTR. FOR AUTHOR

FOR REVIEWER

ABOUT JZUS

Publishing Service