Full Text:   <318>

Summary:  <25>

CLC number: TP311

On-line Access: 2022-05-19

Received: 2021-09-30

Revision Accepted: 2022-05-19

Crosschecked: 2022-02-28

Cited: 0

Clicked: 373

Citations:  Bibtex RefMan EndNote GB/T7714

 ORCID:

Jingxuan ZHANG

https://orcid.org/0000-0002-8437-6640

-   Go to

Article info.
Open peer comments

Frontiers of Information Technology & Electronic Engineering  2022 Vol.23 No.5 P.732-748

http://doi.org/10.1631/FITEE.2100470


Toward an accurate method renaming approach via structural and lexical analyses


Author(s):  Junpeng LUO, Jingxuan ZHANG, Zhiqiu HUANG, Yong XU, Chenxing SUN

Affiliation(s):  College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China; more

Corresponding email(s):   luojunpeng@nuaa.edu.cn, jxzhang@nuaa.edu.cn, zqhuang@nuaa.edu.cn, rogerxu@tencent.com, marssun@tencent.com

Key Words:  Method renaming, Code refactor, Deep learning, Convolutional neural networks


Junpeng LUO, Jingxuan ZHANG, Zhiqiu HUANG, Yong XU, Chenxing SUN. Toward an accurate method renaming approach via structural and lexical analyses[J]. Frontiers of Information Technology & Electronic Engineering, 2022, 23(5): 732-748.

@article{title="Toward an accurate method renaming approach via structural and lexical analyses",
author="Junpeng LUO, Jingxuan ZHANG, Zhiqiu HUANG, Yong XU, Chenxing SUN",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="23",
number="5",
pages="732-748",
year="2022",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2100470"
}

%0 Journal Article
%T Toward an accurate method renaming approach via structural and lexical analyses
%A Junpeng LUO
%A Jingxuan ZHANG
%A Zhiqiu HUANG
%A Yong XU
%A Chenxing SUN
%J Frontiers of Information Technology & Electronic Engineering
%V 23
%N 5
%P 732-748
%@ 2095-9184
%D 2022
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2100470

TY - JOUR
T1 - Toward an accurate method renaming approach via structural and lexical analyses
A1 - Junpeng LUO
A1 - Jingxuan ZHANG
A1 - Zhiqiu HUANG
A1 - Yong XU
A1 - Chenxing SUN
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 23
IS - 5
SP - 732
EP - 748
%@ 2095-9184
Y1 - 2022
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2100470


Abstract: 
Methods in programs must be accurately named to facilitate source code analysis and comprehension. With the evolution of software, method names may be inconsistent with their implemented method bodies, leading to inaccurate or buggy method names. Debugging method names remains an important topic in the literature. Although researchers have proposed several approaches to suggest accurate method names once the method bodies have been modified, two main drawbacks remain to be solved: there is no analysis of method name structure, and the programming context information is not captured efficiently. To resolve these drawbacks and suggest more accurate method names, we propose a novel automated approach based on the analysis of the method name structure and lexical analysis with the programming context information. Our approach first leverages deep feature representation to embed method names and method bodies in vectors. Then, it obtains useful verb-tokens from a large method corpus through structural analysis and noun-tokens from method bodies through lexical analysis. Finally, our approach dynamically combines these tokens to form and recommend high-quality and project-specific method names. Experimental results over 2111 Java testing methods show that the proposed approach can achieve a Hit Ratio, or Hit@5, of 33.62% and outperform the state-of-the-art approach by 14.12% in suggesting accurate method names. We also demonstrate the effectiveness of structural and lexical analyses in our approach.

一种基于结构和词汇分析的精确重命名方法

骆君鹏1,张静宣1,2,黄志球1,徐勇3,孙辰星3
1南京航空航天大学计算机科学与技术学院,中国南京市,211106
2软件新技术与产业化协同创新中心,中国南京市,210023
3腾讯科技有限公司(深圳),中国深圳市,518054
摘要:程序中的方法必须准确命名,以便于源代码分析和理解。随着软件的演变,方法名称可能与其实现的方法体不一致,导致方法名称不准确或有缺陷。调试方法名称仍然是文献中的一个重要主题。尽管研究人员已提出一些方法,用于在方法体被修改后给出准确的方法名称建议,但有两个主要不足仍待解决:对方法名称结构未加以分析,且未能有效捕获编程环境上下文信息。为避免上述不足,并给出更准确的方法名称建议,提出一种基于方法名称结构分析和编程上下文信息词法分析的新颖自动化方法。首先,利用深层特征表示,将方法名称和方法体嵌入向量中;然后,通过结构分析从大型方法语料库中获取有用的动词标记,通过词汇分析从方法体中获取名词标记;最后,动态结合这些标记,形成并推荐高质量和特定于项目的方法名称。在2111个Java测试方法上的实验结果表明,所提方法可以达到33.62%的命中率(Hit@5),并且在建议准确方法名称方面比最先进的方法高出14.12%。此外,展示了所提方法对结构和词汇分析的有效性。

关键词:方法重命名;代码重构;深度学习;卷积神经网络

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Abebe SL, Tonella P, 2013. Automated identifier completion and replacement. Proc 17th European Conf on Software Maintenance and Reengineering, p.263-272.

[2]Abebe SL, Haiduc S, Tonella P, et al., 2011. The effect of lexicon bad smells on concept location in source code. Proc 11th Int Working Conf on Source Code Analysis and Manipulation, p.125-134.

[3]Abebe SL, Arnaoudova V, Tonella P, et al., 2012. Can lexicon bad smells improve fault prediction? Proc 19th Working Conf on Reverse Engineering, p.235-244.

[4]Allamanis M, Barr ET, Bird C, et al., 2014. Learning natural coding conventions. Proc 22nd ACM SIGSOFT Int Symp on Foundations of Software Engineering, p.281-293.

[5]Allamanis M, Barr ET, Bird C, et al., 2015. Suggesting accurate method and class names. Proc 10th Joint Meeting on Foundations of Software Engineering, p.38-49.

[6]Allamanis M, Peng H, Sutton C, 2016. A convolutional attention network for extreme summarization of source code. Proc 33rd Int Conf on Machine Learning, p.2091-2100.

[7]Amann S, Nguyen HA, Nadi S, et al., 2019. A systematic evaluation of static API-misuse detectors. IEEE Trans Softw Eng, 45(12):1170-1188.

[8]Arnaoudova V, Eshkevari LM, di Penta M, et al., 2014. REPENT: analyzing the nature of identifier renamings. IEEE Trans Softw Eng, 40(5):502-532.

[9]Binkley D, Hearn M, Lawrie D, 2011. Improving identifier informativeness using part of speech information. Proc 8th Working Conf on Mining Software Repositories, p.203-206.

[10]Butler S, 2012. Mining Java class identifier naming conventions. Proc 34th Int Conf on Software Engineering, p.1641-1643.

[11]Butler S, 2016. Analysing Java Identifier Names. PhD Thesis, the Open University, England Birmingham, UK.

[12]Butler S, Wermelinger M, Yu YJ, et al., 2009. Relating identifier naming flaws and code quality: an empirical study. Proc 16th Working Conf on Reverse Engineering, p.31-35.

[13]Butler S, Wermelinger M, Yu YJ, et al., 2010. Exploring the influence of identifier names on code quality: an empirical study. Proc 14th European Conf on Software Maintenance and Reengineering, p.156-165.

[14]Butler S, Wermelinger M, Yu YJ, et al., 2011. Mining Java class naming conventions. Proc 27th IEEE Int Conf on Software Maintenance, p.93-102.

[15]Butler S, Wermelinger M, Yu YJ, et al., 2013. INVocD: identifier name vocabulary dataset. Proc 10th Working Conf on Mining Software Repositories, p.405-408.

[16]Caprile B, Tonella P, 1999. Nomen est omen: analyzing the language of function identifiers. Proc 6th Working Conf on Reverse Engineering, p.112-122.

[17]Caprile B, Tonella P, 2000. Restructuring program identifier names. Proc Int Conf on Software Maintenance, p.97-107.

[18]Corbo F, del Grosso C, di Penta M, 2007. Smart formatter: learning coding style from existing source code. Proc IEEE Int Conf on Software Maintenance, p.525-526.

[19]Gosling J, Joy B, Steele G, et al., 2005. The Java™ Language Specification (3rd Ed.). Addison-Wesley, New York, USA.

[20]Hindle A, Barr ET, Su ZD, et al., 2012. On the naturalness of software. Proc 34th Int Conf on Software Engineering, p.837-847.

[21]Høst EW, Østvold BM, 2009. Debugging method names. Proc 23rd European Conf on Object-Oriented Programming, p.294-317.

[22]Kim S, Kim D, 2016. Automatic identifier inconsistency detection using code dictionary. Empir Softw Eng, 21(2):565-604.

[23]Kim Y, 2014. Convolutional neural networks for sentence classification. Proc Conf Empirical Methods in Natural Language Processing, p.1746-1751.

[24]Lawrie D, Morrell C, Feild H, et al., 2006. What’s in a name? A study of identifiers. Proc 14th IEEE Int Conf on Program Comprehension, p.3-12.

[25]Le Q, Mikolov T, 2014. Distributed representations of sentences and documents. Proc 31st Int Conf on Machine Learning, p.II-1188-II-1196.

[26]Li GJ, Liu H, Nyamawe AS, 2021. A survey on renamings of software entities. ACM Comput Surv, 53(2):41.

[27]Lin B, Scalabrino S, Mocci A, et al., 2017. Investigating the use of code analysis and NLP to promote a consistent usage of identifiers. Proc 17th Int Working Conf on Source Code Analysis and Manipulation, p.81-90.

[28]Liu H, Liu QR, Liu Y, et al., 2015. Identifying renaming opportunities by expanding conducted rename refactorings. IEEE Trans Softw Eng, 41(9):887-900.

[29]Liu K, Kim D, Bissyandé TF, et al., 2019. Learning to spot and refactor inconsistent method names. Proc 41st Int Conf on Software Engineering, p.1-12.

[30]Liu K, Kim D, Bissyandé TF, et al., 2021. Mining fix patterns for FindBugs violations. IEEE Trans Softw Eng, 47(1):165-188.

[31]Matsugu M, Mori K, Mitari Y, et al., 2003. Subject independent facial expression recognition with robust face detection using a convolutional neural network. Neur Netw, 16(5-6):555-559.

[32]Mikolov T, Chen K, Corrado G, et al., 2013. Efficient estimation of word representations in vector space. https://arxiv.org/abs/1301.3781

[33]Nguyen TT, Nguyen AT, Nguyen HA, et al., 2013. A statistical semantic language model for source code. Proc 9th Joint Meeting on Foundations of Software Engineering, p.532-542.

[34]Rahman MM, Roy CK, 2014. On the use of context in recommending exception handling code examples. Proc 14th Int Working Conf on Source Code Analysis and Manipulation, p.285-294.

[35]Suzuki T, Sakamoto K, Ishikawa F, et al., 2014. An approach for evaluating and suggesting method names using n-gram models. Proc 22nd Int Conf on Program Comprehension, p.271-274.

[36]Takang AA, Grubb PA, Macredie RD, 1996. The effects of comments and identifier names on program comprehensibility: an experimental investigation. J Program Lang, 4:143-167.

[37]Wang S, Liu TY, Tan L, 2016. Automatically learning semantic features for defect prediction. Proc 38th Int Conf on Software Engineering, p.297-308.

[38]White M, Tufano M, Vendome C, et al., 2016. Deep learning code fragments for code clone detection. Proc 31st IEEE/ACM Int Conf on Automated Software Engineering, p.87-98.

[39]Yu SS, Zhang RC, Guan JH, 2012. Properly and automatically naming Java methods: a machine learning based approach. Proc 8th Int Conf on Advanced Data Mining and Applications, p.235-246.

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - 2022 Journal of Zhejiang University-SCIENCE