CLC number:
On-line Access: 2024-08-27
Received: 2023-10-17
Revision Accepted: 2024-05-08
Crosschecked: 2022-05-19
Cited: 0
Clicked: 2627
Tao ZHANG, Xiaobing SUN, Zibin ZHENG, Ge LI. Intelligent analysis for software data: research and applications[J]. Frontiers of Information Technology & Electronic Engineering, 2022, 23(5): 661-663.
@article{title="Intelligent analysis for software data: research and applications",
author="Tao ZHANG, Xiaobing SUN, Zibin ZHENG, Ge LI",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="23",
number="5",
pages="661-663",
year="2022",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2230000"
}
%0 Journal Article
%T Intelligent analysis for software data: research and applications
%A Tao ZHANG
%A Xiaobing SUN
%A Zibin ZHENG
%A Ge LI
%J Frontiers of Information Technology & Electronic Engineering
%V 23
%N 5
%P 661-663
%@ 2095-9184
%D 2022
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2230000
TY - JOUR
T1 - Intelligent analysis for software data: research and applications
A1 - Tao ZHANG
A1 - Xiaobing SUN
A1 - Zibin ZHENG
A1 - Ge LI
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 23
IS - 5
SP - 661
EP - 663
%@ 2095-9184
Y1 - 2022
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2230000
Abstract:
Over the last few decades, software has been one of the primary drivers of economic growth in the world. Human life depends on reliable software; therefore, the software production process (i.e., software design, development, testing, and maintenance) becomes one of the most important factors to ensure the quality of software. During the production process, large amounts of software data (e.g., source code, bug reports, logs, and user reviews) are generated.
With the increase in the complexity of software, how to use software data to improve the performance and efficiency of software production has become a challenge for software developers and researchers. To address this challenge, researchers have used information retrieval, data mining, and machine learning technologies to implement a series of automated tools to improve the efficiency of some important software engineering tasks, such as code search, code summarization, severity/priority prediction, bug localization, and program repair. However, these traditional approaches cannot deeply capture the semantic relations of contextual information and usually ignore the structural information of source code. Therefore, there is still room to improve the performance of these automated software engineering tasks.
The word “intelligent” means that we can use a new generation of artificial intelligence (AI) technologies (e.g., deep learning) to design a series of “smart” automated tools to improve the effectiveness and efficiency of software engineering tasks so that developers’ workloads are dramatically reduced.
Currently, advancement has been achieved by a new generation of AI approaches, which are well suited to address software engineering problems. We show two classical and popular automated software engineering tasks using “intelligent” analysis technology for software data as follows:
1. Intelligent software development
Code search and summarization can help developers develop quality software and improve efficiency. Code search is a frequent activity in software development that can help developers find suitable code snippets to complete software projects. Developers usually input the descriptions of these snippets as queries to achieve this purpose. However, it is extremely challenging to design a practically useful code search tool. The previous information retrieval based approaches ignored the semantic relationship between the high-level descriptions expressed by natural language and low-level source code, which affects the performance of code search. Different from information retrieval based methods, deep learning technologies can automatically learn feature representations and build mapping relationships between inputs and outputs. Therefore, the performance of code search is improved. Code summarization is the task of automatically generating natural language descriptions of source code, which can help developers understand and maintain software. In traditional automated code summarization work, researchers tend to use the summary template to extract keywords of source code, which ignores the grammar information of source code. At present, neural network technology has developed vigorously. Convolutional neural networks (CNNs), recurrent neural networks (RNNs), transformers, and other deep learning networks are applied to the task of automated code summarization.
2. Intelligent software maintenance
Severity/Priority prediction can automatically recommend appropriate labels to help developers reduce the workload for labeling severity and priority levels, which are the important features of bug reports. Severity shows the serious levels of the reported bugs, while priority indicates which bugs should be first fixed. The prediction task can help developers quickly assign the important bugs to appropriate developers for fixing them so that the efficiency of software maintenance is improved. Traditional approaches usually adopt machine learning technologies such as support vector machine (SVM) and naive Bayes (NB) to predict the severity/priority level. However, these approaches cannot overcome the problem of data imbalance, so the prediction accuracy is not perfect. Some deep learning technologies, such as CNNs and graph convolutional networks (GCNs), can effectively resolve this problem and capture the contextual semantic information of bug reports so that the prediction performance is improved.
In this context, we organize a special feature in the journal Frontiers of Information Technology & Electronic Engineering on intelligent analysis for software data. This special feature covers software architecture recovery, app review analysis, integration testing, software project management, defect prediction, and method rename, as well as related applications. After a rigorous review process, six papers were selected.
Open peer comments: Debate/Discuss/Question/Opinion
<1>