CLC number: TP311.5
On-line Access: 2024-08-27
Received: 2023-10-17
Revision Accepted: 2024-05-08
Crosschecked: 2022-02-05
Cited: 0
Clicked: 3075
Citations: Bibtex RefMan EndNote GB/T7714
Jinfu CHEN, Xiaoli WANG, Saihua CAI, Jiaping XU, Jingyi CHEN, Haibo CHEN. A software defect prediction method with metric compensation based on feature selection and transfer learning[J]. Frontiers of Information Technology & Electronic Engineering, 2022, 23(5): 715-731.
@article{title="A software defect prediction method with metric compensation based on feature selection and transfer learning",
author="Jinfu CHEN, Xiaoli WANG, Saihua CAI, Jiaping XU, Jingyi CHEN, Haibo CHEN",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="23",
number="5",
pages="715-731",
year="2022",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2100468"
}
%0 Journal Article
%T A software defect prediction method with metric compensation based on feature selection and transfer learning
%A Jinfu CHEN
%A Xiaoli WANG
%A Saihua CAI
%A Jiaping XU
%A Jingyi CHEN
%A Haibo CHEN
%J Frontiers of Information Technology & Electronic Engineering
%V 23
%N 5
%P 715-731
%@ 2095-9184
%D 2022
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2100468
TY - JOUR
T1 - A software defect prediction method with metric compensation based on feature selection and transfer learning
A1 - Jinfu CHEN
A1 - Xiaoli WANG
A1 - Saihua CAI
A1 - Jiaping XU
A1 - Jingyi CHEN
A1 - Haibo CHEN
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 23
IS - 5
SP - 715
EP - 731
%@ 2095-9184
Y1 - 2022
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2100468
Abstract: Cross-project software defect prediction solves the problem of insufficient training data for traditional defect prediction, and overcomes the challenge of applying models learned from multiple different source projects to target project. At the same time, two new problems emerge: (1) too many irrelevant and redundant features in the model training process will affect the training efficiency and thus decrease the prediction accuracy of the model; (2) the distribution of metric values will vary greatly from project to project due to the development environment and other factors, resulting in lower prediction accuracy when the model achieves cross-project prediction. In the proposed method, the Pearson feature selection method is introduced to address data redundancy, and the metric compensation based transfer learning technique is used to address the problem of large differences in data distribution between the source project and target project. In this paper, we propose a software defect prediction method with metric compensation based on feature selection and transfer learning. The experimental results show that the model constructed with this method achieves better results on area under the receiver operating characteristic curve (AUC) value and F1-measure metric.
[1]Amasaki S, Kawata K, Yokogawa T, 2015. Improving cross-project defect prediction methods with data simplification. Proc 41st Euromicro Conf on Software Engineering and Advanced Applications, p.96-103.
[2]Briand LC, Melo WL, Wüst J, 2002. Assessing the applicability of fault-proneness models across object-oriented software projects. IEEE Trans Softw Eng, 28(7):706-720.
[3]Cai JC, Xu K, Zhu YH, et al., 2020. Prediction and analysis of net ecosystem carbon exchange based on gradient boosting regression and random forest. Appl Energy, 262:114566.
[4]Chen JY, Yang YT, Hu KK, et al., 2019. Multiview transfer learning for software defect prediction. IEEE Access, 7:8901-8916.
[5]Chen JY, Hu KK, Yu Y, et al., 2020. Software visualization and deep transfer learning for effective software defect prediction. Proc ACM/IEEE 42nd Int Conf on Software Engineering, p.578-589.
[6]Chen X, Zhao YQ, Wang QP, et al., 2018. MULTI: multi-objective effort-aware just-in-time software defect prediction. Inform Softw Technol, 93:1-13.
[7]Fukushima T, Kamei Y, McIntosh S, et al., 2014. An empirical study of just-in-time defect prediction using cross-project models. Proc 11th Working Conf on Mining Software Repositories, p.172-181.
[8]Grimm LG, Nesselroade KP Jr, 2018. Statistical Applications for the Behavioral and Social Sciences (2nd Ed.). John Wiley & Sons, Hoboken, USA.
[9]Guo YC, Shepperd M, Li N, 2018. Bridging effort-aware prediction and strong classification: a just-in-time software defect prediction study. Proc 40th Int Conf on Software Engineering: Companion Proceeedings, p.325-326.
[10]Habibi PA, Amrizal V, Bahaweres RB, 2018. Cross-project defect prediction for web application using naive Bayes (case study: petstore web application). Proc Int Workshop on Big Data and Information Security, p.13-18.
[11]Hall T, Beecham S, Bowes D, et al., 2012. A systematic literature review on fault prediction performance in software engineering. IEEE Trans Softw Eng, 38(6):1276-1304.
[12]He P, Li B, Liu X, et al., 2015. An empirical study on software defect prediction with a simplified metric set. Inform Softw Technol, 59:170-190.
[13]Herbold S, Trautsch A, Grabowski J, 2018. A comparative study to benchmark cross-project defect prediction approaches. Proc 40th Int Conf on Software Engineering, p.1063.
[14]Iqbal T, Cao Y, Kong QQ, et al., 2020. Learning with out-of-distribution data for audio classification. Proc IEEE Int Conf on Acoustics, Speech and Signal Processing, p.636-640.
[15]Kamei Y, Fukushima T, McIntosh S, et al., 2016. Studying just-in-time defect prediction using cross-project models. Empir Softw Eng, 21(5):2072-2106.
[16]Li K, Xiang ZL, Chen T, et al., 2020a. BILO-CPDP: bi-level programming for automated model discovery in cross-project defect prediction. Proc 35th IEEE/ACM Int Conf on Automated Software Engineering, p.573-584.
[17]Li K, Xiang ZL, Chen T, et al., 2020b. Understanding the automated parameter optimization on transfer learning for cross-project defect prediction: an empirical study. Proc ACM/IEEE 42nd Int Conf on Software Engineering, p.566-577.
[18]Liu C, Yang D, Xia X, et al., 2019. A two-phase transfer learning model for cross-project defect prediction. Inform Softw Technol, 107:125-136.
[19]Lv WD, 2019. Method and application of data defect analysis based on linear discriminant regression of far subspace. Cluster Comput, 22(2):4277-4282.
[20]Madeyski L, Jureczko M, 2015. Which process metrics can significantly improve defect prediction models? An empirical study. Softw Qual J, 23(3):393-422.
[21]Malhotra R, 2015. A systematic review of machine learning techniques for software fault prediction. Appl Soft Comput, 27:504-518.
[22]Marian Z, Mircea IG, Czibula IG, et al., 2016. A novel approach for software defect prediction using fuzzy decision trees. Proc 18th Int Symp on Symbolic and Numeric Algorithms for Scientific Computing, p.240-247.
[23]McBride R, Wang K, Ren ZY, et al., 2019. Cost-sensitive learning to rank. Proc 33rd AAAI Conf on Artificial Intelligence, p.4570-4577.
[24]Nam J, Pan SJ, Kim S, 2013. Transfer defect learning. Proc 35th Int Conf on Software Engineering, p.382-391.
[25]Peng ML, Zhang Q, Xing XY, et al., 2019. Trainable undersampling for class-imbalance learning. Proc 33rd AAAI Conf on Artificial Intelligence, p.4707-4714.
[26]Purnami SW, Trapsilasiwi RK, 2017. SMOTE-least square support vector machine for classification of multiclass imbalanced data. Proc 9th Int Conf on Machine Learning and Computing, p.107-111.
[27]Rahman F, Devanbu P, 2013. How, and why, process metrics are better. Proc 35th Int Conf on Software Engineering, p.432-441.
[28]Ryu D, Choi O, Baik J, 2014. Improving prediction robustness of VAB-SVM for cross-project defect prediction. Proc IEEE 17th Int Conf on Computational Science and Engineering, p.994-999.
[29]Ryu D, Choi O, Baik J, 2016. Value-cognitive boosting with a support vector machine for cross-project defect prediction. Empir Softw Eng, 21(1):43-71.
[30]Ryu D, Jang JI, Baik J, 2017. A transfer cost-sensitive boosting approach for cross-project defect prediction. Softw Qual J, 25(1):235-272.
[31]Saidi R, Bouaguel W, Essoussi N, 2019. Hybrid feature selection method based on the genetic algorithm and Pearson correlation coefficient. In: Hassanien AE (Ed.), Machine Learning Paradigms: Theory and Application. Springer, Cham, p.3-24.
[32]Shippey T, Bowes D, Hall T, 2019. Automatically identifying code features for software defect prediction: using AST N-grams. Inform Softw Technol, 106:142-160.
[33]Shuai B, Li HF, Li MJ, et al., 2013. Software defect prediction using dynamic support vector machine. Proc 9th Int Conf on Computational Intelligence and Security, p.260-263.
[34]Siers MJ, Islam Z, 2015. Software defect prediction using a cost sensitive decision forest and voting, and a potential solution to the class imbalance problem. Inform Syst, 51:62-71.
[35]Tabassum S, Minku LL, Feng DY, et al., 2020. An investigation of cross-project learning in online just-in-time software defect prediction. Proc ACM/IEEE 42nd Int Conf on Software Engineering, p.554-565.
[36]Thejas GS, Garg R, Iyengar SS, et al., 2021. Metric and accuracy ranked feature inclusion: hybrids of filter and wrapper feature selection approaches. IEEE Access, 9:128687-128701.
[37]Tsuda N, Washizaki H, Honda K, et al., 2019. WSQF: comprehensive software quality evaluation framework and benchmark based on SQuaRE. Proc IEEE/ACM 41st Int Conf on Software Engineering: Software Engineering in Practice, p.312-321.
[38]Wahono RS, 2015. A systematic literature review of software defect prediction: research trends, datasets, methods and frameworks. J Softw Eng, 1(1):1-16.
[39]Wan ZY, Xia X, Hassan AE, et al., 2020. Perceptions, expectations, and challenges in defect prediction. IEEE Trans Softw Eng, 46(11):1241-1266.
[40]Wang HJ, Khoshgoftaar TM, Napolitano A, 2010. A comparative study of ensemble feature selection techniques for software defect prediction. Proc 9th Int Conf on Machine Learning and Applications, p.135-140.
[41]Watanabe S, Kaiya H, Kaijiri K, 2008. Adapting a fault prediction model to allow inter languagereuse. Proc 4th Int Workshop on Predictor Models in Software Engineering, p.19-24.
[42]Wu F, Jing XY, Dong XW, et al., 2017. Cross-project and within-project semi-supervised software defect prediction problems study using a unified solution. Proc IEEE/ACM 39th Int Conf on Software Engineering Companion, p.195-197.
[43]Yang XL, Lo D, Xia X, et al., 2017. TLEL: a two-layer ensemble learning approach for just-in-time defect prediction. Inform Softw Technol, 87:206-220.
[44]Yu JL, Benesty J, Huang GP, et al., 2015. Optimal single-channel noise reduction filtering matrices from the Pearson correlation coefficient perspective. Proc IEEE Int Conf on Acoustics, Speech and Signal Processing, p.201-205.
Open peer comments: Debate/Discuss/Question/Opinion
<1>