CLC number: TP311
On-line Access: 2020-08-10
Received: 2019-05-02
Revision Accepted: 2019-08-12
Crosschecked: 2020-05-18
Cited: 0
Clicked: 5977
Tian-bao Du, Guo-hua Shen, Zhi-qiu Huang, Yao-shen Yu, De-xiang Wu. Automatic traceability link recovery via active learning[J]. Frontiers of Information Technology & Electronic Engineering, 2020, 21(8): 1217-1225.
@article{title="Automatic traceability link recovery via active learning",
author="Tian-bao Du, Guo-hua Shen, Zhi-qiu Huang, Yao-shen Yu, De-xiang Wu",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="21",
number="8",
pages="1217-1225",
year="2020",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.1900222"
}
%0 Journal Article
%T Automatic traceability link recovery via active learning
%A Tian-bao Du
%A Guo-hua Shen
%A Zhi-qiu Huang
%A Yao-shen Yu
%A De-xiang Wu
%J Frontiers of Information Technology & Electronic Engineering
%V 21
%N 8
%P 1217-1225
%@ 2095-9184
%D 2020
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.1900222
TY - JOUR
T1 - Automatic traceability link recovery via active learning
A1 - Tian-bao Du
A1 - Guo-hua Shen
A1 - Zhi-qiu Huang
A1 - Yao-shen Yu
A1 - De-xiang Wu
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 21
IS - 8
SP - 1217
EP - 1225
%@ 2095-9184
Y1 - 2020
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.1900222
Abstract: traceability link recovery (TLR) is an important and costly software task that requires humans establish relationships between source and target artifact sets within the same project. Previous research has proposed to establish traceability links by machine learning approaches. However, current machine learning approaches cannot be well applied to projects without traceability information (links), because training an effective predictive model requires humans label too many traceability links. To save manpower, we propose a new TLR approach based on active learning (AL), which is called the AL-based approach. We evaluate the AL-based approach on seven commonly used traceability datasets and compare it with an information retrieval based approach and a state-of-the-art machine learning approach. The results indicate that the AL-based approach outperforms the other two approaches in terms of F-score.
[1]Antoniol G, Canfora G, Lucia A, et al., 2000. Information retrieval models for recovering traceability links between code and documentation. 16th Int Conf on Software Maintenance, p.40-49.
[2]Asuncion HU, Asuncion AU, Taylor RN, 2010. Software traceability with topic modeling. 32nd Int Conf on Software Engineering, p.5-104.
[3]Borg M, Runeson P, Ardö A, 2013. Recovering from a decade: a systematic mapping of information retrieval approaches to software traceability. Emp Softw Eng, 19(6):565-1616.
[4]Breiman L, 2001. Random forests. Mach Learn, 45(1):5-32.
[5]Chawla NV, Bowyer KW, Hall LO, et al., 2002. Smote: synthetic minority over-sampling technique. J Artif Intell Res, 16(1):321-357.
[6]Cheng Y, Chen ZZ, Liu L, et al., 2013. Feedback driven multiclass active learning for data streams. 22nd Int Conf on Information & Knowledge Management, p.1311-1320.
[7]Cleland-Huang J, Settimi R, Duan C, et al., 2005. Utilizing supporting evidence to improve dynamic requirements traceability. 13th Int Conf on Requirements Engineering, p.135-144.
[8]Cleland-Huang J, Settimi R, Zou XC, et al., 2007. Automated classification of non-functional requirements. Req Eng, 12(2):103-120.
[9]Cleland-Huang J, Czauderna A, Gibiec M, et al., 2010. A machine learning approach for tracing regulatory codes to product specific requirements. 32nd Int Conf on Software Engineering, p.155-164.
[10]Gethers M, Oliveto R, Poshyvanyk D, et al., 2011. On integrating orthogonal information retrieval methods to improve traceability recovery. 27th Int Conf on Software Maintenance, p.133-142.
[11]He H, Garcia E, 2009. Learning from imbalanced data. IEEE Trans Knowl Data Eng, 21(9):1263-1284.
[12]Jin G, Gibiec M, Cleland-Huang J, 2017. Tackling the term-mismatch problem in automated trace retrieval. Emp Softw Eng, 22(3):1103-1142.
[13]Kuang HY, Nie J, Hu H, et al., 2017. Analyzing closeness of code dependencies for improving IR-based traceability recovery. 24th Int Conf on Software Analysis, Evolution, and Reengineering, p.68-78.
[14]Li ZH, Chen MR, Huang LG, et al., 2015. Recovering traceability links in requirements documents. 19th Conf on Computational Natural Language Learning, p.237-246.
[15]Lucia A, Fasano F, Oliveto R, et al., 2007. Recovering traceability links in software artifact management systems using information retrieval methods. ACM Trans Softw Eng Methodol, 16(4):13.
[16]Lucia A, Marcus A, Oliveto R, et al., 2012. Information retrieval methods for automated traceability recovery. In: Cleland-Huang J, Gotel O, Zisman A (Eds.), Software and Systems Traceability. Springer, London, p.71-98.
[17]Marcus A, Maletic JI, 2003. Recovering documentation-to-source-code traceability links using latent semantic indexing. 25th Int Conf on Software Engineering, p.125-135.
[18]Marcus A, Maletic JI, Sergeyev A, 2005. Recovery of traceability links between software documentation and source code. Int J Soft Eng Knowl Eng, 15(5):811-836.
[19]Mills C, Haiduc S, 2017a. The impact of retrieval direction on IR-based traceability link recovery. 39th Int Conf on Software Engineering: New Ideas and Emerging Technologies Results Track, p.51-54.
[20]Mills C, Haiduc S, 2017b. A machine learning approach for determining the validity of traceability links. 39th Int Conf on Software Engineering Companion, p.121-123.
[21]Mills C, Bavota G, Haiduc S, et al., 2017. Predicting query quality for applications of text retrieval to software engineering tasks. ACM Trans Softw Eng Methodol, 26(1):3.
[22]Mills C, Escobar-Avila J, Haiduc S, 2018. Automatic traceability maintenance via machine learning classification. 34th Int Conf on Software Maintenance and Evolution, p.369-380.
[23]Mirakhorli M, Shin Y, Cleland-Huang J, et al., 2012. A tactic-centric approach for automating traceability of quality concerns. 34th Int Conf on Software Engineering, p.639-649.
[24]Panichella A, McMillan C, Moritz E, et al., 2013. When and how using structural information to improve IR-based traceability recovery. 17th European Conf on Software Maintenance and Reengineering, p.199-208.
[25]Rempel P, Mäder P, 2017. Preventing defects: the impact of requirements traceability completeness on software quality. IEEE Trans Softw Eng, 43(8):777-797.
Open peer comments: Debate/Discuss/Question/Opinion
<1>