Full Text:   <1457>

Summary:  <956>

CLC number: TP311

On-line Access: 2020-08-10

Received: 2019-05-02

Revision Accepted: 2019-08-12

Crosschecked: 2020-05-18

Cited: 0

Clicked: 3158

Citations:  Bibtex RefMan EndNote GB/T7714


Guo-hua Shen


-   Go to

Article info.
Open peer comments

Frontiers of Information Technology & Electronic Engineering  2020 Vol.21 No.8 P.1217-1225


Automatic traceability link recovery via active learning

Author(s):  Tian-bao Du, Guo-hua Shen, Zhi-qiu Huang, Yao-shen Yu, De-xiang Wu

Affiliation(s):  College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, protectauthorcritfootnotesize Nanjing 211106, China; more

Corresponding email(s):   tbdu_312@outlook.com, ghshen@nuaa.edu.cn, zqhuang@nuaa.edu.cn

Key Words:  Automatic, Traceability link recovery, Manpower, Active learning

Tian-bao Du, Guo-hua Shen, Zhi-qiu Huang, Yao-shen Yu, De-xiang Wu. Automatic traceability link recovery via active learning[J]. Frontiers of Information Technology & Electronic Engineering, 2020, 21(8): 1217-1225.

@article{title="Automatic traceability link recovery via active learning",
author="Tian-bao Du, Guo-hua Shen, Zhi-qiu Huang, Yao-shen Yu, De-xiang Wu",
journal="Frontiers of Information Technology & Electronic Engineering",
publisher="Zhejiang University Press & Springer",

%0 Journal Article
%T Automatic traceability link recovery via active learning
%A Tian-bao Du
%A Guo-hua Shen
%A Zhi-qiu Huang
%A Yao-shen Yu
%A De-xiang Wu
%J Frontiers of Information Technology & Electronic Engineering
%V 21
%N 8
%P 1217-1225
%@ 2095-9184
%D 2020
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.1900222

T1 - Automatic traceability link recovery via active learning
A1 - Tian-bao Du
A1 - Guo-hua Shen
A1 - Zhi-qiu Huang
A1 - Yao-shen Yu
A1 - De-xiang Wu
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 21
IS - 8
SP - 1217
EP - 1225
%@ 2095-9184
Y1 - 2020
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.1900222

traceability link recovery (TLR) is an important and costly software task that requires humans establish relationships between source and target artifact sets within the same project. Previous research has proposed to establish traceability links by machine learning approaches. However, current machine learning approaches cannot be well applied to projects without traceability information (links), because training an effective predictive model requires humans label too many traceability links. To save manpower, we propose a new TLR approach based on active learning (AL), which is called the AL-based approach. We evaluate the AL-based approach on seven commonly used traceability datasets and compare it with an information retrieval based approach and a state-of-the-art machine learning approach. The results indicate that the AL-based approach outperforms the other two approaches in terms of F-score.



摘要:可追踪性生成(traceability link recovery,TLR)是一项重要且昂贵的软件任务,需要开发人员在同一项目中建立源制品集合与目标制品集合之间的关系。之前研究提出通过机器学习创建可追踪性方法。但是,当前机器学习方法无法很好地应用于没有追踪信息的项目,因为训练有效的预测模型需要人工标记太多追踪链。为节省人力,提出一种基于主动学习(active learning,AL)的TLR方法,简称基于AL的方法。在7个常用可追踪性数据集上评估该方法,并将其与基于信息检索的方法和最新机器学习方法比较。结果表明,基于AL的方法在F-score方面优于其他两种方法。


Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article


[1]Antoniol G, Canfora G, Lucia A, et al., 2000. Information retrieval models for recovering traceability links between code and documentation. 16th Int Conf on Software Maintenance, p.40-49.

[2]Asuncion HU, Asuncion AU, Taylor RN, 2010. Software traceability with topic modeling. 32nd Int Conf on Software Engineering, p.5-104.

[3]Borg M, Runeson P, Ardö A, 2013. Recovering from a decade: a systematic mapping of information retrieval approaches to software traceability. Emp Softw Eng, 19(6):565-1616.

[4]Breiman L, 2001. Random forests. Mach Learn, 45(1):5-32.

[5]Chawla NV, Bowyer KW, Hall LO, et al., 2002. Smote: synthetic minority over-sampling technique. J Artif Intell Res, 16(1):321-357.

[6]Cheng Y, Chen ZZ, Liu L, et al., 2013. Feedback driven multiclass active learning for data streams. 22nd Int Conf on Information & Knowledge Management, p.1311-1320.

[7]Cleland-Huang J, Settimi R, Duan C, et al., 2005. Utilizing supporting evidence to improve dynamic requirements traceability. 13th Int Conf on Requirements Engineering, p.135-144.

[8]Cleland-Huang J, Settimi R, Zou XC, et al., 2007. Automated classification of non-functional requirements. Req Eng, 12(2):103-120.

[9]Cleland-Huang J, Czauderna A, Gibiec M, et al., 2010. A machine learning approach for tracing regulatory codes to product specific requirements. 32nd Int Conf on Software Engineering, p.155-164.

[10]Gethers M, Oliveto R, Poshyvanyk D, et al., 2011. On integrating orthogonal information retrieval methods to improve traceability recovery. 27th Int Conf on Software Maintenance, p.133-142.

[11]He H, Garcia E, 2009. Learning from imbalanced data. IEEE Trans Knowl Data Eng, 21(9):1263-1284.

[12]Jin G, Gibiec M, Cleland-Huang J, 2017. Tackling the term-mismatch problem in automated trace retrieval. Emp Softw Eng, 22(3):1103-1142.

[13]Kuang HY, Nie J, Hu H, et al., 2017. Analyzing closeness of code dependencies for improving IR-based traceability recovery. 24th Int Conf on Software Analysis, Evolution, and Reengineering, p.68-78.

[14]Li ZH, Chen MR, Huang LG, et al., 2015. Recovering traceability links in requirements documents. 19th Conf on Computational Natural Language Learning, p.237-246.

[15]Lucia A, Fasano F, Oliveto R, et al., 2007. Recovering traceability links in software artifact management systems using information retrieval methods. ACM Trans Softw Eng Methodol, 16(4):13.

[16]Lucia A, Marcus A, Oliveto R, et al., 2012. Information retrieval methods for automated traceability recovery. In: Cleland-Huang J, Gotel O, Zisman A (Eds.), Software and Systems Traceability. Springer, London, p.71-98.

[17]Marcus A, Maletic JI, 2003. Recovering documentation-to-source-code traceability links using latent semantic indexing. 25th Int Conf on Software Engineering, p.125-135.

[18]Marcus A, Maletic JI, Sergeyev A, 2005. Recovery of traceability links between software documentation and source code. Int J Soft Eng Knowl Eng, 15(5):811-836.

[19]Mills C, Haiduc S, 2017a. The impact of retrieval direction on IR-based traceability link recovery. 39th Int Conf on Software Engineering: New Ideas and Emerging Technologies Results Track, p.51-54.

[20]Mills C, Haiduc S, 2017b. A machine learning approach for determining the validity of traceability links. 39th Int Conf on Software Engineering Companion, p.121-123.

[21]Mills C, Bavota G, Haiduc S, et al., 2017. Predicting query quality for applications of text retrieval to software engineering tasks. ACM Trans Softw Eng Methodol, 26(1):3.

[22]Mills C, Escobar-Avila J, Haiduc S, 2018. Automatic traceability maintenance via machine learning classification. 34th Int Conf on Software Maintenance and Evolution, p.369-380.

[23]Mirakhorli M, Shin Y, Cleland-Huang J, et al., 2012. A tactic-centric approach for automating traceability of quality concerns. 34th Int Conf on Software Engineering, p.639-649.

[24]Panichella A, McMillan C, Moritz E, et al., 2013. When and how using structural information to improve IR-based traceability recovery. 17th European Conf on Software Maintenance and Reengineering, p.199-208.

[25]Rempel P, Mäder P, 2017. Preventing defects: the impact of requirements traceability completeness on software quality. IEEE Trans Softw Eng, 43(8):777-797.

Open peer comments: Debate/Discuss/Question/Opinion


Please provide your name, email address and a comment

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - 2022 Journal of Zhejiang University-SCIENCE