CLC number: TP311
On-line Access: 2024-08-27
Received: 2023-10-17
Revision Accepted: 2024-05-08
Crosschecked: 2022-02-28
Cited: 0
Clicked: 2671
Junpeng LUO, Jingxuan ZHANG, Zhiqiu HUANG, Yong XU, Chenxing SUN. Toward an accurate method renaming approach via structural and lexical analyses[J]. Frontiers of Information Technology & Electronic Engineering, 2022, 23(5): 732-748.
@article{title="Toward an accurate method renaming approach via structural and lexical analyses",
author="Junpeng LUO, Jingxuan ZHANG, Zhiqiu HUANG, Yong XU, Chenxing SUN",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="23",
number="5",
pages="732-748",
year="2022",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2100470"
}
%0 Journal Article
%T Toward an accurate method renaming approach via structural and lexical analyses
%A Junpeng LUO
%A Jingxuan ZHANG
%A Zhiqiu HUANG
%A Yong XU
%A Chenxing SUN
%J Frontiers of Information Technology & Electronic Engineering
%V 23
%N 5
%P 732-748
%@ 2095-9184
%D 2022
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2100470
TY - JOUR
T1 - Toward an accurate method renaming approach via structural and lexical analyses
A1 - Junpeng LUO
A1 - Jingxuan ZHANG
A1 - Zhiqiu HUANG
A1 - Yong XU
A1 - Chenxing SUN
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 23
IS - 5
SP - 732
EP - 748
%@ 2095-9184
Y1 - 2022
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2100470
Abstract: Methods in programs must be accurately named to facilitate source code analysis and comprehension. With the evolution of software, method names may be inconsistent with their implemented method bodies, leading to inaccurate or buggy method names. Debugging method names remains an important topic in the literature. Although researchers have proposed several approaches to suggest accurate method names once the method bodies have been modified, two main drawbacks remain to be solved: there is no analysis of method name structure, and the programming context information is not captured efficiently. To resolve these drawbacks and suggest more accurate method names, we propose a novel automated approach based on the analysis of the method name structure and lexical analysis with the programming context information. Our approach first leverages deep feature representation to embed method names and method bodies in vectors. Then, it obtains useful verb-tokens from a large method corpus through structural analysis and noun-tokens from method bodies through lexical analysis. Finally, our approach dynamically combines these tokens to form and recommend high-quality and project-specific method names. Experimental results over 2111 Java testing methods show that the proposed approach can achieve a Hit Ratio, or Hit@5, of 33.62% and outperform the state-of-the-art approach by 14.12% in suggesting accurate method names. We also demonstrate the effectiveness of structural and lexical analyses in our approach.
[1]Abebe SL, Tonella P, 2013. Automated identifier completion and replacement. Proc 17th European Conf on Software Maintenance and Reengineering, p.263-272.
[2]Abebe SL, Haiduc S, Tonella P, et al., 2011. The effect of lexicon bad smells on concept location in source code. Proc 11th Int Working Conf on Source Code Analysis and Manipulation, p.125-134.
[3]Abebe SL, Arnaoudova V, Tonella P, et al., 2012. Can lexicon bad smells improve fault prediction? Proc 19th Working Conf on Reverse Engineering, p.235-244.
[4]Allamanis M, Barr ET, Bird C, et al., 2014. Learning natural coding conventions. Proc 22nd ACM SIGSOFT Int Symp on Foundations of Software Engineering, p.281-293.
[5]Allamanis M, Barr ET, Bird C, et al., 2015. Suggesting accurate method and class names. Proc 10th Joint Meeting on Foundations of Software Engineering, p.38-49.
[6]Allamanis M, Peng H, Sutton C, 2016. A convolutional attention network for extreme summarization of source code. Proc 33rd Int Conf on Machine Learning, p.2091-2100.
[7]Amann S, Nguyen HA, Nadi S, et al., 2019. A systematic evaluation of static API-misuse detectors. IEEE Trans Softw Eng, 45(12):1170-1188.
[8]Arnaoudova V, Eshkevari LM, di Penta M, et al., 2014. REPENT: analyzing the nature of identifier renamings. IEEE Trans Softw Eng, 40(5):502-532.
[9]Binkley D, Hearn M, Lawrie D, 2011. Improving identifier informativeness using part of speech information. Proc 8th Working Conf on Mining Software Repositories, p.203-206.
[10]Butler S, 2012. Mining Java class identifier naming conventions. Proc 34th Int Conf on Software Engineering, p.1641-1643.
[11]Butler S, 2016. Analysing Java Identifier Names. PhD Thesis, the Open University, England Birmingham, UK.
[12]Butler S, Wermelinger M, Yu YJ, et al., 2009. Relating identifier naming flaws and code quality: an empirical study. Proc 16th Working Conf on Reverse Engineering, p.31-35.
[13]Butler S, Wermelinger M, Yu YJ, et al., 2010. Exploring the influence of identifier names on code quality: an empirical study. Proc 14th European Conf on Software Maintenance and Reengineering, p.156-165.
[14]Butler S, Wermelinger M, Yu YJ, et al., 2011. Mining Java class naming conventions. Proc 27th IEEE Int Conf on Software Maintenance, p.93-102.
[15]Butler S, Wermelinger M, Yu YJ, et al., 2013. INVocD: identifier name vocabulary dataset. Proc 10th Working Conf on Mining Software Repositories, p.405-408.
[16]Caprile B, Tonella P, 1999. Nomen est omen: analyzing the language of function identifiers. Proc 6th Working Conf on Reverse Engineering, p.112-122.
[17]Caprile B, Tonella P, 2000. Restructuring program identifier names. Proc Int Conf on Software Maintenance, p.97-107.
[18]Corbo F, del Grosso C, di Penta M, 2007. Smart formatter: learning coding style from existing source code. Proc IEEE Int Conf on Software Maintenance, p.525-526.
[19]Gosling J, Joy B, Steele G, et al., 2005. The Java™ Language Specification (3rd Ed.). Addison-Wesley, New York, USA.
[20]Hindle A, Barr ET, Su ZD, et al., 2012. On the naturalness of software. Proc 34th Int Conf on Software Engineering, p.837-847.
[21]Høst EW, Østvold BM, 2009. Debugging method names. Proc 23rd European Conf on Object-Oriented Programming, p.294-317.
[22]Kim S, Kim D, 2016. Automatic identifier inconsistency detection using code dictionary. Empir Softw Eng, 21(2):565-604.
[23]Kim Y, 2014. Convolutional neural networks for sentence classification. Proc Conf Empirical Methods in Natural Language Processing, p.1746-1751.
[24]Lawrie D, Morrell C, Feild H, et al., 2006. What’s in a name? A study of identifiers. Proc 14th IEEE Int Conf on Program Comprehension, p.3-12.
[25]Le Q, Mikolov T, 2014. Distributed representations of sentences and documents. Proc 31st Int Conf on Machine Learning, p.II-1188-II-1196.
[26]Li GJ, Liu H, Nyamawe AS, 2021. A survey on renamings of software entities. ACM Comput Surv, 53(2):41.
[27]Lin B, Scalabrino S, Mocci A, et al., 2017. Investigating the use of code analysis and NLP to promote a consistent usage of identifiers. Proc 17th Int Working Conf on Source Code Analysis and Manipulation, p.81-90.
[28]Liu H, Liu QR, Liu Y, et al., 2015. Identifying renaming opportunities by expanding conducted rename refactorings. IEEE Trans Softw Eng, 41(9):887-900.
[29]Liu K, Kim D, Bissyandé TF, et al., 2019. Learning to spot and refactor inconsistent method names. Proc 41st Int Conf on Software Engineering, p.1-12.
[30]Liu K, Kim D, Bissyandé TF, et al., 2021. Mining fix patterns for FindBugs violations. IEEE Trans Softw Eng, 47(1):165-188.
[31]Matsugu M, Mori K, Mitari Y, et al., 2003. Subject independent facial expression recognition with robust face detection using a convolutional neural network. Neur Netw, 16(5-6):555-559.
[32]Mikolov T, Chen K, Corrado G, et al., 2013. Efficient estimation of word representations in vector space. https://arxiv.org/abs/1301.3781
[33]Nguyen TT, Nguyen AT, Nguyen HA, et al., 2013. A statistical semantic language model for source code. Proc 9th Joint Meeting on Foundations of Software Engineering, p.532-542.
[34]Rahman MM, Roy CK, 2014. On the use of context in recommending exception handling code examples. Proc 14th Int Working Conf on Source Code Analysis and Manipulation, p.285-294.
[35]Suzuki T, Sakamoto K, Ishikawa F, et al., 2014. An approach for evaluating and suggesting method names using n-gram models. Proc 22nd Int Conf on Program Comprehension, p.271-274.
[36]Takang AA, Grubb PA, Macredie RD, 1996. The effects of comments and identifier names on program comprehensibility: an experimental investigation. J Program Lang, 4:143-167.
[37]Wang S, Liu TY, Tan L, 2016. Automatically learning semantic features for defect prediction. Proc 38th Int Conf on Software Engineering, p.297-308.
[38]White M, Tufano M, Vendome C, et al., 2016. Deep learning code fragments for code clone detection. Proc 31st IEEE/ACM Int Conf on Automated Software Engineering, p.87-98.
[39]Yu SS, Zhang RC, Guan JH, 2012. Properly and automatically naming Java methods: a machine learning based approach. Proc 8th Int Conf on Advanced Data Mining and Applications, p.235-246.
Open peer comments: Debate/Discuss/Question/Opinion
<1>