Publishing Service

Polishing & Checking

Frontiers of Information Technology & Electronic Engineering

ISSN 2095-9184 (print), ISSN 2095-9230 (online)

Discovering semantically related technical terms and web resources in Q&A discussions

Abstract: A sheer number of techniques and web resources are available for software engineering practice and this number continues to grow. Discovering semantically similar or related technical terms and web resources offers the opportunity to design appealing services to facilitate information retrieval and information discovery. In this study, we extract technical terms and web resources from a community of question and answer (Qɪ) discussions and propose an approach based on a neural language model to learn the semantic representations of technical terms and web resources in a joint low-dimensional vector space. Our approach maps technical terms and web resources to a semantic vector space based only on the surrounding technical terms and web resources of a technical term (or web resource) in a discussion thread, without the need for mining the text content of the discussion. We apply our approach to Stack Overflow data dump of March 2018. Through both quantitative and qualitative analyses in the clustering, search, and semantic reasoning tasks, we show that the learnt technical-term and web-resource vector representations can capture the semantic relatedness of technical terms and web resources, and they can be exploited to support various search and semantic reasoning tasks, by means of simple K-nearest neighbor search and simple algebraic operations on the learnt vector representations in the embedding space.

Key words: Technical terms, Web resources, Word embedding, Q&A web site, Clustering tasks, Recommendation tasks

Chinese Summary  <23> 从问答讨论中发现语义相关的技术术语和网络资源

贾俊芳1,Valeriia TUMANIAN2,李国强2
1山西大同大学计算机与网络工程学院,中国大同市,037009
2上海交通大学软件学院,中国上海市,200240
摘要:目前网络上拥有大量可用于软件工程实践的技术和网络资源,并且这个数量还在持续增长。发现语义相似或相关的技术术语和网络资源,可以设计吸引人的服务,以促进信息检索和信息发现的机会。本文从问答(Q&A)讨论的社区中提取技术术语和网络资源,并提出一种基于神经网络语言模型的技术术语和网络资源在联合低维向量空间中的语义表示方法。方法仅基于讨论线程中技术术语(或网络资源)的周围技术术语和web资源,将技术术语和网络资源映射到语义向量空间,而不需挖掘讨论的文本内容。将方法应用于2018年3月的堆栈溢出数据转储。对聚类、搜索和语义推理任务的定量和定性分析表明,所学习的技术术语和网络资源向量表示可以捕获技术术语和网络资源的语义相关性,通过简单的K近邻搜索和在嵌入空间中对学习的向量表示作简单的代数运算,可以支持各种搜索和语义推理任务。

关键词组:技术术语;网络资源;词语嵌入;问答网站;聚类任务;推荐任务


Share this article to: More

Go to Contents

References:

<Show All>

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





DOI:

10.1631/FITEE.2000186

CLC number:

TP311

Download Full Text:

Click Here

Downloaded:

4084

Download summary:

<Click Here> 

Downloaded:

1479

Clicked:

4459

Cited:

0

On-line Access:

2021-07-20

Received:

2020-04-21

Revision Accepted:

2020-12-23

Crosschecked:

2021-06-16

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952276; Fax: +86-571-87952331; E-mail: jzus@zju.edu.cn
Copyright © 2000~ Journal of Zhejiang University-SCIENCE