|
Frontiers of Information Technology & Electronic Engineering
ISSN 2095-9184 (print), ISSN 2095-9230 (online)
2020 Vol.21 No.8 P.1217-1225
Automatic traceability link recovery via active learning
Abstract: Traceability link recovery (TLR) is an important and costly software task that requires humans establish relationships between source and target artifact sets within the same project. Previous research has proposed to establish traceability links by machine learning approaches. However, current machine learning approaches cannot be well applied to projects without traceability information (links), because training an effective predictive model requires humans label too many traceability links. To save manpower, we propose a new TLR approach based on active learning (AL), which is called the AL-based approach. We evaluate the AL-based approach on seven commonly used traceability datasets and compare it with an information retrieval based approach and a state-of-the-art machine learning approach. The results indicate that the AL-based approach outperforms the other two approaches in terms of F-score.
Key words: Automatic, Traceability link recovery, Manpower, Active learning
1南京航空航天大学计算机学院,中国南京市,211106
2软件新技术与产业化协同创新中心,中国南京市,210093
3南京航空航天大学安全关键软件重点实验室,中国南京市,211106
摘要:可追踪性生成(traceability link recovery,TLR)是一项重要且昂贵的软件任务,需要开发人员在同一项目中建立源制品集合与目标制品集合之间的关系。之前研究提出通过机器学习创建可追踪性方法。但是,当前机器学习方法无法很好地应用于没有追踪信息的项目,因为训练有效的预测模型需要人工标记太多追踪链。为节省人力,提出一种基于主动学习(active learning,AL)的TLR方法,简称基于AL的方法。在7个常用可追踪性数据集上评估该方法,并将其与基于信息检索的方法和最新机器学习方法比较。结果表明,基于AL的方法在F-score方面优于其他两种方法。
关键词组:
References:
Open peer comments: Debate/Discuss/Question/Opinion
<1>
DOI:
10.1631/FITEE.1900222
CLC number:
TP311
Download Full Text:
Downloaded:
4487
Download summary:
<Click Here>Downloaded:
1461Clicked:
6517
Cited:
0
On-line Access:
2024-08-27
Received:
2023-10-17
Revision Accepted:
2024-05-08
Crosschecked:
2020-05-18