CLC number: TN914; TN915; TP311
On-line Access: 2024-08-27
Received: 2023-10-17
Revision Accepted: 2024-05-08
Crosschecked: 2009-09-27
Cited: 2
Clicked: 9237
Peng HUANG, Jie ZHU. Multi-instance learning for software quality estimation in object-oriented systems: a case study[J]. Journal of Zhejiang University Science C, 2010, 11(2): 130-138.
@article{title="Multi-instance learning for software quality estimation in object-oriented systems: a case study",
author="Peng HUANG, Jie ZHU",
journal="Journal of Zhejiang University Science C",
volume="11",
number="2",
pages="130-138",
year="2010",
publisher="Zhejiang University Press & Springer",
doi="10.1631/jzus.C0910084"
}
%0 Journal Article
%T Multi-instance learning for software quality estimation in object-oriented systems: a case study
%A Peng HUANG
%A Jie ZHU
%J Journal of Zhejiang University SCIENCE C
%V 11
%N 2
%P 130-138
%@ 1869-1951
%D 2010
%I Zhejiang University Press & Springer
%DOI 10.1631/jzus.C0910084
TY - JOUR
T1 - Multi-instance learning for software quality estimation in object-oriented systems: a case study
A1 - Peng HUANG
A1 - Jie ZHU
J0 - Journal of Zhejiang University Science C
VL - 11
IS - 2
SP - 130
EP - 138
%@ 1869-1951
Y1 - 2010
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/jzus.C0910084
Abstract: We investigate a problem of object-oriented (OO) software quality estimation from a multi-instance (MI) perspective. In detail, each set of classes that have an inheritance relation, named ‘class hierarchy’, is regarded as a bag, while each class in the set is regarded as an instance. The learning task in this study is to estimate the label of unseen bags, i.e., the fault-proneness of untested class hierarchies. A fault-prone class hierarchy contains at least one fault-prone (negative) class, while a non-fault-prone (positive) one has no negative class. Based on the modification records (MRs) of the previous project releases and OO software metrics, the fault-proneness of an untested class hierarchy can be predicted. Several selected MI learning algorithms were evaluated on five datasets collected from an industrial software project. Among the MI learning algorithms investigated in the experiments, the kernel method using a dedicated MI-kernel was better than the others in accurately and correctly predicting the fault-proneness of the class hierarchies. In addition, when compared to a supervised support vector machine (SVM) algorithm, the MI-kernel method still had a competitive performance with much less cost.
[1] Andrews, S., Tsochantaridis, I., Hofmann, T., 2003. Support Vector Machines for Multiple-Instance Learning. Proc. 15th Advances in Neural Information Processing Systems, p.561-568.
[2] Auer, P., Ortner, R., 2004. A Boosting Approach to Multiple Instance Learning. Proc. 15th European Conf. on Machine Learning, p.63-74.
[3] Basili, V., Briand, L., Melo, W., 1996. A validation of object-oriented design metrics as quality indicators. IEEE Trans. Software Eng., 22(10):751-761.
[4] Berard, E.V., 1998. Metrics for Object-Oriented Software Engineering. Available at http://www.ipipan.gda.pl/~marek/objects/TOA/moose.html [Accessed on Dec. 10, 2009].
[5] Briand, L., Wust, J., Daly, J., Victor, P.D., 2000. Exploring the relationships between design measures and software quality in object-oriented systems. J. Syst. Software, 51(3):245-273.
[6] Cartwright, M., Shepperd, M., 2000. An empirical investigation of an object-oriented software system. IEEE Trans. Software Eng., 26(8):786-796.
[7] Catal, C., Diri, B., 2008. A fault prediction model with limited fault data to improve test process. LNCS, 5089:244-257.
[8] Chen, Y., Bi, J., Wang, J.Z., 2006. MILES: multiple-instance learning via embedded instance selection. IEEE Trans. Pattern Anal. Mach. Intell., 28(12):1931-1947.
[9] Chevaleyre, Y., Zucker, J.D., 2001. Solving multiple-instance and multiple-part learning problems with decision trees and decision rules. LNCS, 2056:204-214.
[10] Chidamber, S., Kemerer, C., 1994. A metrics suite for object-oriented design. IEEE Trans. Software Eng., 20(6):476-493.
[11] Dietterich, T.G., Lathrop, R.H., Lozano-Pérez, T., 1997. Solving the multiple instance problem with axis-parallel rectangles. Artif. Intell., 89(1-2):31-71.
[12] Driessens, K., Reutemann, P., Pfahringer, B., Leschi, C., 2006. Using weighted nearest neighbor to benefit from unlabeled data. LNCS, 3918:60-69.
[13] Elish, K.O., Elish, M.O., 2008. Predicting defect-prone software modules using support vector machines. J. Syst. Software, 81(5):649-660.
[14] Evett, M., Khoshgoftar, T., Chien, P.D., Allen, E., 1998. GP-Based Software Quality Prediction. Proc. 3rd Annual Genetic Programming Conf., p.60-65.
[15] Fenton, N., Krause, P., Neil, M., 2002. Software measurement: uncertainty and causal modeling. Software, 19(4):116-122.
[16] Gartner, T., Flach, P.A., Kowalczyk, A., Smola, A.J., 2002. Multi-Instance Kernels. Proc. 19th Int. Conf. on Machine Learning, p.179-186.
[17] Guo, L., Ma, Y., Cukic, B., Singh, H., 2004. Robust Prediction of Fault-Proneness by Random Forests. Proc. 15th Int. Symp. on Software Reliability Engineering, p.417-428.
[18] Huang, P., Zhu, J., 2008. Predicting the fault-proneness of class hierarchy in object-oriented software using a layered kernel. J. Zhejiang Univ. Sci. A, 9(10):1390-1397.
[19] Huang, S.J., Lin, C.Y., Chiu, N.H., 2006. Fuzzy decision tree approach for embedding risk assessment information into software cost estimation model. J. Inf. Sci. Eng., 22(2):297-313.
[20] Kanmani, S., Uthariaraj, V.R., Sankaranarayanan, V., 2007. Object-oriented software fault prediction using neural networks. Inf. Software Technol., 49(5):483-492.
[21] Khoshgoftaar, T.M., Allen, E.B., Hudepohl, J.P., Aud, S.J., 1997. Application of neural networks to software quality modeling of a very large telecommunications systems. IEEE Trans. Neur. Networks, 8(4):902-909.
[22] Khoshgoftaar, T.M., Allen, E.B., Deng, J., 2002. Using regression trees to classify fault-prone software modules. IEEE Trans. Rel., 51(4):455-462.
[23] Kohavi, R., 1995. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. Proc. 4th Int. Joint Conf. on Artificial Intelligence, p.1137-1143.
[24] Maron, O., Lozano-Pérez, T., 1998. A Framework for Multiple Instance Learning. Proc. 10th Advances in Neural Information Processing Systems, p.570-576.
[25] Reformat, M., Pedrycz, W., Pizzi, N.J., 2003. Software quality analysis with the use of computational intelligence. Inf. Software Technol., 45(7):405-417.
[26] Ruffo, G., 2000. Learning Single and Multiple Instance Decision Trees for Computer Security Applications. PhD Thesis, Department of Computer Science, University of Turin, Torino, Italy, p.425-432.
[27] Seliya, N., Khoshgoftaar, T.M., 2007. Software quality estimation with limited fault data: a semi-supervised learning perspective. Software Qual. J., 15(3):327-344.
[28] Tang, M.H., Kao, M.H., Chen, M.H., 1999. An Empirical Study on Object Oriented Metrics. Proc. 6th Int. Conf. on Software Metrics Symp., p.242-249.
[29] Vishwanathan, S.V.N., Smola, A.J., Murty, M.N., 2003. Simple SVM. Proc. 20th Int. Conf. on Machine Learning, p.760-767.
[30] Wang, J., Zucker, J.D., 2000. Solving Multiple-Instance Problem: A Lazy Learning Approach. Proc. 17th Int. Conf. on Machine Learning, p.1119-1125.
[31] Weidmann, N., Frank, E., Pfahringer, B., 2003. A Two-level Learning Method for Generalized Multi-Instance Problem. Proc. European Conf. on Machine Learning, p.468-479.
[32] Zhang, M.L., Zhou, Z.H., 2004. Improve multi-instance neural networks through feature selection. Neur. Process. Lett., 19(1):1-10.
[33] Zhang, Q., Goldman, S.A., 2001. EM-DD: An Improved Multiple-Instance Learning Technique. Proc. 14th Advances in Neural Information Processing Systems, p.1073-1080.
[34] Zhou, Z.H., Zhang, M.L., 2006. Multi-Instance Multi-Label Learning with Application to Scene Classification. Proc. Advances in Neural Information Processing Systems, p.1609-1616.
[35] Zhou, Z.H., Jiang, K., Li, M., 2005. Multi-instance learning based Web mining. Appl. Intell., 22(2):135-147.
Open peer comments: Debate/Discuss/Question/Opinion
<1>