JZUS - Journal of Zhejiang University SCIENCE

Frontiers of Information Technology & Electronic Engineering

ISSN 2095-9184 (print), ISSN 2095-9230 (online)

2020 Vol.21 No.8 P.1217-1225

Automatic traceability link recovery via active learning

Tian-bao Du, Guo-hua Shen, Zhi-qiu Huang, Yao-shen Yu, De-xiang Wu

College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, protectauthorcritfootnotesize Nanjing 211106, China; Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing 210093, China; Key Laboratory of Safety-Critical Software, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China

tbdu_312@outlook.com, ghshen@nuaa.edu.cn, zqhuang@nuaa.edu.cn

Abstract: Traceability link recovery (TLR) is an important and costly software task that requires humans establish relationships between source and target artifact sets within the same project. Previous research has proposed to establish traceability links by machine learning approaches. However, current machine learning approaches cannot be well applied to projects without traceability information (links), because training an effective predictive model requires humans label too many traceability links. To save manpower, we propose a new TLR approach based on active learning (AL), which is called the AL-based approach. We evaluate the AL-based approach on seven commonly used traceability datasets and compare it with an information retrieval based approach and a state-of-the-art machine learning approach. The results indicate that the AL-based approach outperforms the other two approaches in terms of F-score.

Key words: Automatic, Traceability link recovery, Manpower, Active learning

Chinese Summary <45> 基于主动学习的可追踪性自动化生成方法

杜天保¹，沈国华^1,2,3，黄志球^1,2,3，喻垚慎¹，吴德香¹
¹南京航空航天大学计算机学院，中国南京市，211106
²软件新技术与产业化协同创新中心，中国南京市，210093
³南京航空航天大学安全关键软件重点实验室，中国南京市，211106

摘要：可追踪性生成（traceability link recovery，TLR）是一项重要且昂贵的软件任务，需要开发人员在同一项目中建立源制品集合与目标制品集合之间的关系。之前研究提出通过机器学习创建可追踪性方法。但是，当前机器学习方法无法很好地应用于没有追踪信息的项目，因为训练有效的预测模型需要人工标记太多追踪链。为节省人力，提出一种基于主动学习（active learning，AL）的TLR方法，简称基于AL的方法。在7个常用可追踪性数据集上评估该方法，并将其与基于信息检索的方法和最新机器学习方法比较。结果表明，基于AL的方法在F-score方面优于其他两种方法。

关键词组：自动；可追踪性生成；人力；主动学习

Share this article to： More

Go to Contents

References:

Open peer comments: Debate/Discuss/Question/Opinion

<1>

DOI:

10.1631/FITEE.1900222

CLC number:

TP311

Download Full Text:

Click Here

Downloaded:

5551

Download summary:

Downloaded:

1864

Clicked:

7492

Cited:

On-line Access:

2024-08-27

Received:

2023-10-17

Revision Accepted:

2024-05-08

Crosschecked:

2020-05-18

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952276; Fax: +86-571-87952331; E-mail: jzus@zju.edu.cn
Copyright © 2000~ Journal of Zhejiang University-SCIENCE

CONTENTS

INSTR. FOR AUTHOR

FOR REVIEWER

ABOUT JZUS

Publishing Service