Publishing Service

Polishing & Checking

Frontiers of Information Technology & Electronic Engineering

ISSN 2095-9184 (print), ISSN 2095-9230 (online)

Fast code recommendation via approximate sub-tree matching

Abstract: Software developers often write code that has similar functionality to existing code segments. A code recommendation tool that helps developers reuse these code fragments can significantly improve their efficiency. Several methods have been proposed in recent years. Some use sequence matching algorithms to find the related recommendations. Most of these methods are time-consuming and can leverage only low-level textual information from code. Others extract features from code and obtain similarity using numerical feature vectors. However, the similarity of feature vectors is often not equivalent to the original code’s similarity. Structural information is lost during the process of transforming abstract syntax trees into vectors. We propose an approximate sub-tree matching based method to solve this problem. Unlike existing tree-based approaches that match feature vectors, it retains the tree structure of the query code in the matching process to find code fragments that best match the current query. It uses a fast approximation sub-tree matching algorithm by transforming the sub-tree matching problem into the match between the tree and the list. In this way, the structural information can be used for code recommendation tasks that have high time requirements. We have constructed several real-world code databases covering different languages and granularities to evaluate the effectiveness of our method. The results show that our method outperforms two compared methods, SENSORY and Aroma, in terms of the recall value on all the datasets, and can be applied to large datasets.

Key words: Code reuse; Code recommendation; Tree similarity; Structure information

Chinese Summary  <25> 基于近似子树匹配的快速代码推荐方法

邵宜超1,2,3,黄志球1,2,3,李伟湋1,2,3,喻垚慎1,2,3
1南京航空航天大学计算机科学与技术学院,中国南京市,211100
2工业和信息化部安全关键软件重点实验室,中国南京市,211100
3软件新技术与产业化协同创新中心,中国南京市,210016
摘要:软件开发人员通常需编写与已有代码具有类似功能的代码,而帮助开发人员重用这些代码片段的代码推荐工具可显著提高软件开发效率。近年来许多研究者开始关注这一领域,并提出多种代码推荐方法。一些研究者使用序列匹配算法得到相关代码,这些方法往往效率较低,且只能利用代码中的文本信息。另一些研究者从代码中提取特征并形成特征向量,从而计算代码间相似性并得到推荐结果。然而特征向量相似往往不代表原始代码相似,在将抽象语法树转换为向量的过程中存在结构信息丢失问题。对此,我们提出一种基于近似子树匹配的代码推荐方法。与现有基于特征向量匹配的方法不同,该方法在匹配过程中保留了查询代码的树型结构,从而找到与当前查询在结构上最为相似的代码片段。此外,通过哈希思想将子树匹配问题转化为树与列表间的匹配,使得抽象语法树信息可以用于对时间要求较高的代码推荐任务。为评估方法的有效性,构建了多个涵盖不同语言和粒度的代码数据集。实验结果表明,该方法在所有数据集上的召回率均优于两种对比方法—SENSORY和Aroma,且可以应用于大型数据集。

关键词组:代码复用;代码推荐;树相似度;结构信息


Share this article to: More

Go to Contents

References:

<Show All>

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





DOI:

10.1631/FITEE.2100379

CLC number:

TP311

Download Full Text:

Click Here

Downloaded:

3585

Download summary:

<Click Here> 

Downloaded:

297

Clicked:

1897

Cited:

0

On-line Access:

2022-08-22

Received:

2021-08-07

Revision Accepted:

2022-03-24

Crosschecked:

2022-08-29

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952276; Fax: +86-571-87952331; E-mail: jzus@zju.edu.cn
Copyright © 2000~ Journal of Zhejiang University-SCIENCE