Full Text:   <1845>

Summary:  <249>

CLC number: TP311

On-line Access: 2022-08-22

Received: 2021-08-07

Revision Accepted: 2022-03-24

Crosschecked: 2022-08-29

Cited: 0

Clicked: 1579

Citations:  Bibtex RefMan EndNote GB/T7714


Zhiqiu Huang


Yichao SHAO


-   Go to

Article info.
Open peer comments

Frontiers of Information Technology & Electronic Engineering  2022 Vol.23 No.8 P.1205-1216


Fast code recommendation via approximate sub-tree matching

Author(s):  Yichao SHAO, Zhiqiu HUANG, Weiwei LI, Yaoshen YU

Affiliation(s):  School of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211100, China; more

Corresponding email(s):   shaoyichao@nuaa.edu.cn, zqhuang@nuaa.edu.cn

Key Words:  Code reuse, Code recommendation, Tree similarity, Structure information

Yichao SHAO, Zhiqiu HUANG, Weiwei LI, Yaoshen YU. Fast code recommendation via approximate sub-tree matching[J]. Frontiers of Information Technology & Electronic Engineering, 2022, 23(8): 1205-1216.

@article{title="Fast code recommendation via approximate sub-tree matching",
author="Yichao SHAO, Zhiqiu HUANG, Weiwei LI, Yaoshen YU",
journal="Frontiers of Information Technology & Electronic Engineering",
publisher="Zhejiang University Press & Springer",

%0 Journal Article
%T Fast code recommendation via approximate sub-tree matching
%A Yichao SHAO
%A Zhiqiu HUANG
%A Weiwei LI
%A Yaoshen YU
%J Frontiers of Information Technology & Electronic Engineering
%V 23
%N 8
%P 1205-1216
%@ 2095-9184
%D 2022
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2100379

T1 - Fast code recommendation via approximate sub-tree matching
A1 - Yichao SHAO
A1 - Zhiqiu HUANG
A1 - Weiwei LI
A1 - Yaoshen YU
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 23
IS - 8
SP - 1205
EP - 1216
%@ 2095-9184
Y1 - 2022
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2100379

Software developers often write code that has similar functionality to existing code segments. A code recommendation tool that helps developers reuse these code fragments can significantly improve their efficiency. Several methods have been proposed in recent years. Some use sequence matching algorithms to find the related recommendations. Most of these methods are time-consuming and can leverage only low-level textual information from code. Others extract features from code and obtain similarity using numerical feature vectors. However, the similarity of feature vectors is often not equivalent to the original code’s similarity. Structural information is lost during the process of transforming abstract syntax trees into vectors. We propose an approximate sub-tree matching based method to solve this problem. Unlike existing tree-based approaches that match feature vectors, it retains the tree structure of the query code in the matching process to find code fragments that best match the current query. It uses a fast approximation sub-tree matching algorithm by transforming the sub-tree matching problem into the match between the tree and the list. In this way, the structural information can be used for code recommendation tasks that have high time requirements. We have constructed several real-world code databases covering different languages and granularities to evaluate the effectiveness of our method. The results show that our method outperforms two compared methods, SENSORY and Aroma, in terms of the recall value on all the datasets, and can be applied to large datasets.




Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article


[1]Ai L, Huang ZQ, Li WW, et al., 2019. SENSORY: leveraging code statement sequence information for code snippets recommendation. Proc IEEE 43rd Annual Computer Software and Applications Conf, p.27-36.

[2]Antunes B, Furtado B, Gomes P, 2014. Context-based search, recommendation and browsing in software development. In: Brézillon P, Gonzalez AJ (Eds.), Context in Computing: a Cross-Disciplinary Approach for Modeling the Real World. Springer, New York, USA, p.45-62.

[3]Baxter ID, Yahin A, Moura L, et al., 1998. Clone detection using abstract syntax trees. Proc Int Conf on Software Maintenance, p.368-377.

[4]Chen S, Zhang KZ, 2014. An improved algorithm for tree edit distance with applications for RNA secondary structure comparison. J Comb Optim, 27(4):778-797.

[5]Ďuračík M, Kršák E, Hrkút P, 2020. Searching source code fragments using incremental clustering. Concurr Comput Pract Exp, 32(13):e5416.

[6]Holmes R, Murphy GC, 2005. Using structural context to recommend source code examples. Proc 27th Int Conf on Software Engineering.

[7]Jiang H, Nie LM, Sun ZY, et al., 2019. ROSF: leveraging information retrieval and supervised learning for recommending code snippets. IEEE Trans Serv Comput, 12(1):34-46.

[8]Jiang LX, Misherghi G, Su ZD, et al., 2007. DECKARD: scalable and accurate tree-based detection of code clones. Proc 29th Int Conf on Software Engineering, p.96-105.

[9]Kamalpriya CM, Singh P, 2018. Enhancing program dependency graph based clone detection using approximate subgraph matching. Proc IEEE 11th Int Workshop on Software Clones, p.1-7.

[10]Luan SF, Yang D, Barnaby C, et al., 2018. Aroma: code recommendation via structural code search. Proc ACM Program Lang, 3:152.

[11]Mou LL, Li G, Zhang L, et al., 2016. Convolutional neural networks over tree structures for programming language processing. Proc 31st AAAI Conf on Artificial Intelligence, p.1287-1293.

[12]Rahman MM, Roy CK, 2014. On the use of context in recommending exception handling code examples. Proc IEEE 14th Int Working Conf on Source Code Analysis and Manipulation, p.285-294.

[13]Rahman MM, Roy CK, Lo D, 2016. RACK: automatic API recommendation using crowdsourced knowledge. Proc IEEE 23rd Int Conf on Software Analysis, Evolution, and Reengineering, p.349-359.

[14]Sahavechaphan N, Claypool K, 2006. XSnippet: mining for sample code. Proc 21st Annual ACM SIGPLAN Conf on Object-Oriented Programming Systems, Languages, and Applications, p.413-430.

[15]Saini V, Farmahinifarahani F, Lu YD, et al., 2018. Oreo: detection of clones in the twilight zone. Proc 26th ACM Joint Meeting on European Software Engineering Conf and Symp on the Foundations of Software Engineering, p.354-365.

[16]Shasha D, Wang JTL, Zhang KZ, et al., 1994. Exact and approximate algorithms for unordered tree matching. IEEE Trans Syst Man Cybern, 24(4):668-678.

[17]Smith TF, Waterman MS, 1981. Identification of common molecular subsequences. J Mol Biol, 147(1):195-197.

[18]Svajlenko J, Roy CK, 2021. The mutation and injection framework: evaluating clone detection tools with mutation analysis. IEEE Trans Soft Eng, 47(5):1060-1087.

[19]Yang YM, Ren ZL, Chen X, et al., 2018. Structural function based code clone detection using a new hybrid technique. Proc IEEE 42nd Annual Computer Software and Applications Conf, p.286-291.

[20]Ye YW, Fischer G, 2002. Supporting reuse by delivering task-relevant and personalized information. Proc 24th Int Conf on Software Engineering.

[21]Zhang KZ, Shasha D, 1989. Simple fast algorithms for the editing distance between trees and related problems. SIAM J Comput, 18:1245-1262.

Open peer comments: Debate/Discuss/Question/Opinion


Please provide your name, email address and a comment

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - 2024 Journal of Zhejiang University-SCIENCE