CLC number: TP311.5
On-line Access: 2022-05-19
Received: 2021-09-30
Revision Accepted: 2022-05-19
Crosschecked: 2022-02-05
Cited: 0
Clicked: 2075
Citations: Bibtex RefMan EndNote GB/T7714
Jinfu CHEN, Xiaoli WANG, Saihua CAI, Jiaping XU, Jingyi CHEN, Haibo CHEN. A software defect prediction method with metric compensation based on feature selection and transfer learning[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.2100468 @article{title="A software defect prediction method with metric compensation based on feature selection and transfer learning", %0 Journal Article TY - JOUR
一种基于特征选择与迁移学习的度量补偿软件缺陷预测方法1江苏大学计算机科学与通信工程学院,中国镇江市,212013 2江苏大学工业网络空间安全技术江苏省重点实验室,中国镇江市,212013 摘要:跨项目软件缺陷预测解决了传统缺陷预测中训练数据不足的问题,克服了将多个不同源项目中学习的模型应用于目标项目的挑战。与此同时,出现两个新问题:(1)模型训练过程中过多无关和冗余特征影响训练效率,降低了模型预测精度;(2)由于开发环境等因素,度量值的分布因项目而异,当模型用于跨项目预测时,预测精度较低。本文引入皮尔逊特征选择方法解决数据冗余问题,采用基于迁移学习的度量补偿技术解决源项目和目标项目之间数据分布差异较大的问题。提出一种基于特征选择和迁移学习的度量补偿软件缺陷预测方法。实验结果表明,用该方法构建的模型在AUC(接收器工作特性曲线下面积)值和F1度量指标上取得较好结果。 关键词组: Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article
Reference[1]Amasaki S, Kawata K, Yokogawa T, 2015. Improving cross-project defect prediction methods with data simplification. Proc 41st Euromicro Conf on Software Engineering and Advanced Applications, p.96-103. [2]Briand LC, Melo WL, Wüst J, 2002. Assessing the applicability of fault-proneness models across object-oriented software projects. IEEE Trans Softw Eng, 28(7):706-720. [3]Cai JC, Xu K, Zhu YH, et al., 2020. Prediction and analysis of net ecosystem carbon exchange based on gradient boosting regression and random forest. Appl Energy, 262:114566. [4]Chen JY, Yang YT, Hu KK, et al., 2019. Multiview transfer learning for software defect prediction. IEEE Access, 7:8901-8916. [5]Chen JY, Hu KK, Yu Y, et al., 2020. Software visualization and deep transfer learning for effective software defect prediction. Proc ACM/IEEE 42nd Int Conf on Software Engineering, p.578-589. [6]Chen X, Zhao YQ, Wang QP, et al., 2018. MULTI: multi-objective effort-aware just-in-time software defect prediction. Inform Softw Technol, 93:1-13. [7]Fukushima T, Kamei Y, McIntosh S, et al., 2014. An empirical study of just-in-time defect prediction using cross-project models. Proc 11th Working Conf on Mining Software Repositories, p.172-181. [8]Grimm LG, Nesselroade KP Jr, 2018. Statistical Applications for the Behavioral and Social Sciences (2nd Ed.). John Wiley & Sons, Hoboken, USA. [9]Guo YC, Shepperd M, Li N, 2018. Bridging effort-aware prediction and strong classification: a just-in-time software defect prediction study. Proc 40th Int Conf on Software Engineering: Companion Proceeedings, p.325-326. [10]Habibi PA, Amrizal V, Bahaweres RB, 2018. Cross-project defect prediction for web application using naive Bayes (case study: petstore web application). Proc Int Workshop on Big Data and Information Security, p.13-18. [11]Hall T, Beecham S, Bowes D, et al., 2012. A systematic literature review on fault prediction performance in software engineering. IEEE Trans Softw Eng, 38(6):1276-1304. [12]He P, Li B, Liu X, et al., 2015. An empirical study on software defect prediction with a simplified metric set. Inform Softw Technol, 59:170-190. [13]Herbold S, Trautsch A, Grabowski J, 2018. A comparative study to benchmark cross-project defect prediction approaches. Proc 40th Int Conf on Software Engineering, p.1063. [14]Iqbal T, Cao Y, Kong QQ, et al., 2020. Learning with out-of-distribution data for audio classification. Proc IEEE Int Conf on Acoustics, Speech and Signal Processing, p.636-640. [15]Kamei Y, Fukushima T, McIntosh S, et al., 2016. Studying just-in-time defect prediction using cross-project models. Empir Softw Eng, 21(5):2072-2106. [16]Li K, Xiang ZL, Chen T, et al., 2020a. BILO-CPDP: bi-level programming for automated model discovery in cross-project defect prediction. Proc 35th IEEE/ACM Int Conf on Automated Software Engineering, p.573-584. [17]Li K, Xiang ZL, Chen T, et al., 2020b. Understanding the automated parameter optimization on transfer learning for cross-project defect prediction: an empirical study. Proc ACM/IEEE 42nd Int Conf on Software Engineering, p.566-577. [18]Liu C, Yang D, Xia X, et al., 2019. A two-phase transfer learning model for cross-project defect prediction. Inform Softw Technol, 107:125-136. [19]Lv WD, 2019. Method and application of data defect analysis based on linear discriminant regression of far subspace. Cluster Comput, 22(2):4277-4282. [20]Madeyski L, Jureczko M, 2015. Which process metrics can significantly improve defect prediction models? An empirical study. Softw Qual J, 23(3):393-422. [21]Malhotra R, 2015. A systematic review of machine learning techniques for software fault prediction. Appl Soft Comput, 27:504-518. [22]Marian Z, Mircea IG, Czibula IG, et al., 2016. A novel approach for software defect prediction using fuzzy decision trees. Proc 18th Int Symp on Symbolic and Numeric Algorithms for Scientific Computing, p.240-247. [23]McBride R, Wang K, Ren ZY, et al., 2019. Cost-sensitive learning to rank. Proc 33rd AAAI Conf on Artificial Intelligence, p.4570-4577. [24]Nam J, Pan SJ, Kim S, 2013. Transfer defect learning. Proc 35th Int Conf on Software Engineering, p.382-391. [25]Peng ML, Zhang Q, Xing XY, et al., 2019. Trainable undersampling for class-imbalance learning. Proc 33rd AAAI Conf on Artificial Intelligence, p.4707-4714. [26]Purnami SW, Trapsilasiwi RK, 2017. SMOTE-least square support vector machine for classification of multiclass imbalanced data. Proc 9th Int Conf on Machine Learning and Computing, p.107-111. [27]Rahman F, Devanbu P, 2013. How, and why, process metrics are better. Proc 35th Int Conf on Software Engineering, p.432-441. [28]Ryu D, Choi O, Baik J, 2014. Improving prediction robustness of VAB-SVM for cross-project defect prediction. Proc IEEE 17th Int Conf on Computational Science and Engineering, p.994-999. [29]Ryu D, Choi O, Baik J, 2016. Value-cognitive boosting with a support vector machine for cross-project defect prediction. Empir Softw Eng, 21(1):43-71. [30]Ryu D, Jang JI, Baik J, 2017. A transfer cost-sensitive boosting approach for cross-project defect prediction. Softw Qual J, 25(1):235-272. [31]Saidi R, Bouaguel W, Essoussi N, 2019. Hybrid feature selection method based on the genetic algorithm and Pearson correlation coefficient. In: Hassanien AE (Ed.), Machine Learning Paradigms: Theory and Application. Springer, Cham, p.3-24. [32]Shippey T, Bowes D, Hall T, 2019. Automatically identifying code features for software defect prediction: using AST N-grams. Inform Softw Technol, 106:142-160. [33]Shuai B, Li HF, Li MJ, et al., 2013. Software defect prediction using dynamic support vector machine. Proc 9th Int Conf on Computational Intelligence and Security, p.260-263. [34]Siers MJ, Islam Z, 2015. Software defect prediction using a cost sensitive decision forest and voting, and a potential solution to the class imbalance problem. Inform Syst, 51:62-71. [35]Tabassum S, Minku LL, Feng DY, et al., 2020. An investigation of cross-project learning in online just-in-time software defect prediction. Proc ACM/IEEE 42nd Int Conf on Software Engineering, p.554-565. [36]Thejas GS, Garg R, Iyengar SS, et al., 2021. Metric and accuracy ranked feature inclusion: hybrids of filter and wrapper feature selection approaches. IEEE Access, 9:128687-128701. [37]Tsuda N, Washizaki H, Honda K, et al., 2019. WSQF: comprehensive software quality evaluation framework and benchmark based on SQuaRE. Proc IEEE/ACM 41st Int Conf on Software Engineering: Software Engineering in Practice, p.312-321. [38]Wahono RS, 2015. A systematic literature review of software defect prediction: research trends, datasets, methods and frameworks. J Softw Eng, 1(1):1-16. [39]Wan ZY, Xia X, Hassan AE, et al., 2020. Perceptions, expectations, and challenges in defect prediction. IEEE Trans Softw Eng, 46(11):1241-1266. [40]Wang HJ, Khoshgoftaar TM, Napolitano A, 2010. A comparative study of ensemble feature selection techniques for software defect prediction. Proc 9th Int Conf on Machine Learning and Applications, p.135-140. [41]Watanabe S, Kaiya H, Kaijiri K, 2008. Adapting a fault prediction model to allow inter languagereuse. Proc 4th Int Workshop on Predictor Models in Software Engineering, p.19-24. [42]Wu F, Jing XY, Dong XW, et al., 2017. Cross-project and within-project semi-supervised software defect prediction problems study using a unified solution. Proc IEEE/ACM 39th Int Conf on Software Engineering Companion, p.195-197. [43]Yang XL, Lo D, Xia X, et al., 2017. TLEL: a two-layer ensemble learning approach for just-in-time defect prediction. Inform Softw Technol, 87:206-220. [44]Yu JL, Benesty J, Huang GP, et al., 2015. Optimal single-channel noise reduction filtering matrices from the Pearson correlation coefficient perspective. Proc IEEE Int Conf on Acoustics, Speech and Signal Processing, p.201-205. Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou
310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn Copyright © 2000 - 2024 Journal of Zhejiang University-SCIENCE |
Open peer comments: Debate/Discuss/Question/Opinion
<1>