Journal of Zhejiang University

Frontiers of Information Technology & Electronic Engineering 2017 Vol.18 No.8 P.1082-1107

Improved binary similarity measures for software modularization

Author(s): Rashid Naseem, Mustafa Bin Mat Deris, Onaiza Maqbool, Jing-peng Li, Sara Shahzad, Habib Shah
Affiliation(s): 1. Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia, Parit Raja 86400, Malaysia more
Corresponding email(s): rnsqau@gmail.com
Key Words: Binary similarity measure, Binary features, Combination of measures, Software modularization

Share this article to： More <<< Previous Article \|Next Article >>>

Rashid Naseem, Mustafa Bin Mat Deris, Onaiza Maqbool, Jing-peng Li, Sara Shahzad, Habib Shah. Improved binary similarity measures for software modularization[J]. Frontiers of Information Technology & Electronic Engineering, 2017, 18(8): 1082-1107.

@article{title="Improved binary similarity measures for software modularization",
author="Rashid Naseem, Mustafa Bin Mat Deris, Onaiza Maqbool, Jing-peng Li, Sara Shahzad, Habib Shah",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="18",
number="8",
pages="1082-1107",
year="2017",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.1500373"
}

%0 Journal Article
%T Improved binary similarity measures for software modularization
%A Rashid Naseem
%A Mustafa Bin Mat Deris
%A Onaiza Maqbool
%A Jing-peng Li
%A Sara Shahzad
%A Habib Shah
%J Frontiers of Information Technology & Electronic Engineering
%V 18
%N 8
%P 1082-1107
%@ 2095-9184
%D 2017
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.1500373

TY - JOUR
T1 - Improved binary similarity measures for software modularization
A1 - Rashid Naseem
A1 - Mustafa Bin Mat Deris
A1 - Onaiza Maqbool
A1 - Jing-peng Li
A1 - Sara Shahzad
A1 - Habib Shah
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 18
IS - 8
SP - 1082
EP - 1107
%@ 2095-9184
Y1 - 2017
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.1500373

Abstract
Chinese Summary
Academic Network
Reviewer Comment

Abstract: Various binary similarity measures have been employed in clustering approaches to make homogeneous groups of similar entities in the data. These similarity measures are mostly based only on the presence or absence of features. binary similarity measures have also been explored with different clustering approaches (e.g., agglomerative hierarchical clustering) for software modularization to make software systems understandable and manageable. Each similarity measure has its own strengths and weaknesses which improve and deteriorate the clustering results, respectively. We highlight the strengths of some well-known existing binary similarity measures for software modularization. Furthermore, based on these existing similarity measures, we introduce several improved new binary similarity measures. Proofs of the correctness with illustration and a series of experiments are presented to evaluate the effectiveness of our new binary similarity measures.

改进的软件模块化二元相似度测量

目的：各种各样的二元相似度测量在聚类方法中被用来确定数据中的相似实体的同类组。这些相似度测量大多数仅基于特征的存在或缺失。二元相似度测量在软件模块化中亦能与不同的聚类方法一起用于提高软件系统的可理解性与可管理性。每种相似度测量都有其优势与不足，分别能使聚类结果优化或恶化。
创新点：本文强调了软件模块化中一些已有的著名的二元相似度测量的优势。此外，基于这些已有的相似度测量，新提出了几种改进的相似度测量。
方法：首先，介绍了一些软件模块化中已有的著名的二元相似度测量的优势。接着，提出了几种改进的新的相似度测量。结合具体例子，说明这些新方法整合了JC、JNM和RR这几种已有的二元相似度测量的优势。最后，通过实验比较新方法与已有方法的结果，验证所提新方法的有效性。
结论：实验结果表明相较于已有的相似度测量，本文所提出的新的二元相似度测量结果的可信度更高。这些新方法能减少任意决策的数量，增加聚类过程中聚类的数量。尽管这些新方法仅基于数据的二元特征向量表达，它们能被用来测试任何编程语言编写的软件系统。

关键词：二元相似度测量；二元特征；测量组合；软件模块化

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Andreopoulos, B., An, A.J., Tzerpos, V., et al., 2005. Multiple layer clustering of large software systems. Proc. 12th Working Conf. on Reverse Engineering, p.79-88.

[2]Andritsos, P., Tzerpos, V., 2005. Information-theoretic software clustering. IEEE Trans. Softw. Eng., 31(2): 150-165.

[3]Anquetil, N., Lethbridge, T.C., 1999. Experiments with clustering as a software remodularization method. Proc. 6th Working Conf. on Reverse Engineering, p.235-255.

[4]Bauer, M., Trifu, M., 2004. Architecture-aware adaptive clustering of OO systems. Proc. 8th European Conf. on Software Maintenance and Reengineering, p.3-14.

[5]Bittencourt, R.A., Guerrero, D.D.S., 2009. Comparison of graph clustering algorithms for recovering software architecture module views. Proc. 13th European Conf. on Software Maintenance and Reengineering, p.251-254.

[6]Cheetham, A.H., Hazel, J.E., 1969. Binary (presence-absence) similarity coefficents. J. Paleontol., 43(5): 1130-1136.

[7]Chong, C.Y., Lee, S.P., Ling, T.C., 2013. Efficient software clustering technique using an adaptive and preventive dendrogram cutting approach. Inform. Softw. Technol., 55(11):1994-2012.

[8]Cui, J.F., Chae, H.S., 2011. Applying agglomerative hierarchical clustering algorithms to component identification for legacy systems. Inform. Softw. Technol., 53(6): 601-614.

[9]Davey, J., Burd, E., 2000. Evaluating the suitability of data clustering for software remodularisation. Proc. 7th Working Conf. on Reverse Engineering, p.268-276.

[10]Dugerdil, P., Jossi, S., 2008. Reverse-architecting legacy software based on roles: an industrial experiment. Commun. Comput. Inform. Sci., 22:114-127.

[11]Glorie, M., Zaidman, A., van Deursen, A., et al., 2009. Splitting a large software repository for easing future software evolution–-an industrial experience report. em J. Softw. Mainten. Evol. Res. Pract., 21(2):113-141.

[12]Godfrey, M.W., Lee, E.H., 2000. Secrets from the monster: extracting Mozilla’s software architecture. Proc. Int. Symp. on Constructing Software Engineering Tools, p.1-10.

[13]Hall, M., Walkinshaw, N., McMinn, P., 2012. Supervised software modularisation. Proc. 28th IEEE Int. Conf. on Software Maintenance, p.472-481.

[14]Hussain, I., Khanum, A., Abbasi, A.Q., et al., 2015. A novel approach for software architecture recovery using particle swarm optimization. Int. Arab. J. Inform. Technol., 12(1):1-10.

[15]Jackson, D.A., Somers, K.M., Harvey, H.H., 1989. Similarity coefficients: measures of co-occurrence and association or simply measures of occurrence. Am. Nat., 133(3):436-453.

[16]Jahnke, J.H., 2004. Reverse engineering software architecture using rough clusters. Proc. IEEE Annual Meeting of the Fuzzy Information, p.4-9.

[17]Kanellopoulos, Y., Antonellis, P., Tjortjis, C., et al., 2007. K-attractors: a clustering algorithm for software measurement data analysis. Proc. 19th IEEE Int. Conf. on Tools with Artificial Intelligence, p.358-365.

[18]Lakhotia, A., 1997. A unified framework for expressing software subsystem classification techniques. J. Syst. Softw., 36(3):211-231.

[19]Lesot, M.J., Rifqi, M., Benhadda, H., 2009. Similarity measures for binary and numerical data: a survey. Int. J. Knowl. Eng. Soft Data Parad., 1(1):63.

[20]Lung, C.H., Zaman, M., Nandi, A., 2004. Applications of clustering techniques to software partitioning, recovery and restructuring. J. Syst. Softw., 73(2):227-244.

[21]Lutellier, T., Chollak, D., Garcia, J., et al., 2015. Comparing software architecture recovery techniques using accurate dependencies. Proc. 37th IEEE Int. Conf. on Software Engineering, p.69-78.

[22]Maqbool, O., Babri, H., 2004. The weighted combined algorithm: a linkage algorithm for software clustering. Proc. 8th European Conf. on Software Maintenance and Reengineering, p.15-24.

[23]Maqbool, O., Babri, H., 2007. Hierarchical clustering for software architecture recovery. IEEE Trans. Softw. Eng., 33(11):759-780.

[24]Mitchell, B.S., 2006. Clustering Software Systems to Identify Subsystem Structures. Technical Report, Department of Mathematics and Computer Science, Drexel University, USA.

[25]Mitchell, B.S., Mancoridis, S., 2006. On the automatic modularization of software systems using the Bunch tool. IEEE Trans. Softw. Eng., 32(3):193-208.

[26]Muhammad, S., Maqbool, O., Abbasi, A.Q., 2012. Evaluating relationship categories for clustering object-oriented software systems. IET Softw., 6(3):260-274.

[27]Naseem, R., Maqbool, O., Muhammad, S., 2010. An improved similarity measure for binary features in software clustering. Proc. 2nd Int. Conf. on Computational Intelligence, Modelling and Simulation, p.111-116.

[28]Naseem, R., Maqbool, O., Muhammad, S., 2011. Improved similarity measures for software clustering. Proc. 15th European Conf. on Software Maintenance and Reengineering, p.45-54.

[29]Naseem, R., Maqbool, O., Muhammad, S., 2013. Cooperative clustering for software modularization. J. Syst. Softw., 86(8):2045-2062.

[30]Patel, C., Hamou-Lhadj, A., Rilling, J., 2009. Software clustering using dynamic analysis and static dependencies. Proc. 13th European Conf. on Software Maintenance and Reengineering, p.27-36.

[31]Praditwong, K., 2011. Solving software module clustering problem by evolutionary algorithms. Proc. 8th Int. Joint Conf. on Computer Science and Software Engineering, p.154-159.

[32]Praditwong, K., Harman, M., Yao, X., 2011. Software module clustering as a multi-objective search problem. IEEE Trans. Softw. Eng., 37(2):264-282.

[33]Saeed, M., Maqbool, O., Babri, H., et al., 2003. Software clustering techniques and the use of combined algorithm. Proc. 7th European Conf. on Software Maintenance and Reengineering, p.301-306.

[34]Sartipi, K., Kontogiannis, K., 2003. On modeling software architecture recovery as graph matching. Proc. Int. Conf. on Software Maintenance, p.224-234.

[35]Seung-Seok, C., Cha, S.H., Tappert, C.C., 2010. A survey of binary similarity and distance measures. J. Syst. Cybern. Inform., 8(1):43-48.

[36]Shah, Z., Naseem, R., Orgun, M., et al., 2013. Software clustering using automated feature subset selection. Proc. Int. Conf. on Advanced Data Mining and Applications, p.47-58.

[37]Shtern, M., Tzerpos, V., 2010. On the comparability of software clustering algorithms. Proc. IEEE 18th Int. Conf. on Program Comprehension, p.64-67.

[38]Shtern, M., Tzerpos, V., 2012. Clustering methodologies for software engineering. Adv. Softw. Eng., 2012: 792024.1-792024.18.

[39]Shtern, M., Tzerpos, V., 2014. Methods for selecting and improving software clustering algorithms. Softw. Pract. Exp., 44(1):33-46.

[40]Siddique, F., Maqbool, O., 2012. Enhancing comprehensibility of software clustering results. IET Softw., 6(4):283.

[41]Synytskyy, N., Holt, R.C., Davis, I., 2005. Browsing software architectures with LSEdit. Proc. 13th Int. Workshop on Program Comprehension, p.176-178.

[42]Tonella, P., 2001. Concept analysis for module restructuring. IEEE Trans. Softw. Eng., 27(4):351-363.

[43]Tzerpos, V., Holt, R.C., 1999. MoJo: a distance metric for software clusterings. Proc. 6th Working Conf. on Reverse Engineering, p.187-193.

[44]Tzerpos, V., Holt, R.C., 2000. On the stability of software clustering algorithms. Proc. 8th Int. Workshop on Program Comprehension, p.211-218.

[45]Vasconcelos, A., Werner, C., 2007. Architecture recovery and evaluation aiming at program understanding and reuse. Proc. Int. Conf. on the Quality of Software Architectures, p.72-89.

[46]Veal, B.W.G., 2011. Binary Similarity Measures and Their Applications in Machine Learning. PhD Thesis, London School of Economics, London, UK.

[47]Wang, Y., Liu, P., Guo, H., et al., 2010. Improved hierarchical clustering algorithm for software architecture recovery. Proc. Int. Conf. on Intelligent Computing and Cognitive Informatics, p.247-250.

[48]Wen, Z., Tzerpos, V., 2003. An optimal algorithm for MoJo distance. Proc. 11th IEEE Int. Workshop on Program Comprehension, p.227-235.

[49]Wen, Z., Tzerpos, V., 2004. An effectiveness measure for software clustering algorithms. Proc. 12th IEEE Int. Workshop on Program Comprehension, p.194-203.

[50]Wiggerts, T.A., 1997. Using clustering algorithms in legacy systems remodularization. Proc. 4th Working Conf. on Reverse Engineering, p.33-43.

[51]Wu, J., Hassan, A.E., Holt, R.C., 2005. Comparison of clustering algorithms in the context of software evolution. Proc. 21st IEEE Int. Conf. on Software Maintenance, p.525-535.

[52]Xanthos, S., Goodwin, N., 2006. Clustering object-oriented software systems using spectral graph partitioning. Urbana, 51(1):1-5.

[53]Xia, C., Tzerpos, V., 2005. Software clustering based on dynamic dependencies. Proc. 9th European Conf. on Software Maintenance and Reengineering, p.124-133.

Open peer comments: Debate/Discuss/Question/Opinion

<1>