
CLC number: TP311
On-line Access: 2024-08-27
Received: 2023-10-17
Revision Accepted: 2024-05-08
Crosschecked: 2021-06-16
Cited: 0
Clicked: 6538
Citations: Bibtex RefMan EndNote GB/T7714
https://orcid.org/0000-0002-3451-8487
Junfang Jia, Valeriia Tumanian, Guoqiang Li. Discovering semantically related technical terms and web resources in Q&A discussions[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.2000186 @article{title="Discovering semantically related technical terms and web resources in Q&A discussions", %0 Journal Article TY - JOUR
从问答讨论中发现语义相关的技术术语和网络资源1山西大同大学计算机与网络工程学院,中国大同市,037009 2上海交通大学软件学院,中国上海市,200240 摘要:目前网络上拥有大量可用于软件工程实践的技术和网络资源,并且这个数量还在持续增长。发现语义相似或相关的技术术语和网络资源,可以设计吸引人的服务,以促进信息检索和信息发现的机会。本文从问答(Q&A)讨论的社区中提取技术术语和网络资源,并提出一种基于神经网络语言模型的技术术语和网络资源在联合低维向量空间中的语义表示方法。方法仅基于讨论线程中技术术语(或网络资源)的周围技术术语和web资源,将技术术语和网络资源映射到语义向量空间,而不需挖掘讨论的文本内容。将方法应用于2018年3月的堆栈溢出数据转储。对聚类、搜索和语义推理任务的定量和定性分析表明,所学习的技术术语和网络资源向量表示可以捕获技术术语和网络资源的语义相关性,通过简单的K近邻搜索和在嵌入空间中对学习的向量表示作简单的代数运算,可以支持各种搜索和语义推理任务。 关键词组: Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article
Reference[1]Agrawal R, Imieliński T, Swami A, 1993. Mining association rules between sets of items in large databases. ACM SIGMOD Rec, 22(2):207-216. ![]() [2]Bansal M, Gimpel K, Livescu K, 2014. Tailoring continuous word representations for dependency parsing. Proc 52nd Annual Meeting of the Association for Computational Linguistics, p.809-815. ![]() [3]Baroni M, Dinu G, Kruszewski G, 2014. Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. Proc 52nd Annual Meeting of the Association for Computational Linguistics, p.238-247. ![]() [4]Barua A, Thomas SW, Hassan AE, 2014. What are developers talking about? An analysis of topics and trends in Stack Overflow. Empir Softw Eng, 19(3):619-654. ![]() [5]Blei DM, Ng AY, Jordan MI, 2003. Latent Dirichlet allocation. J Mach Learn Res, 3(4-5):993-1022. ![]() [6]Bullinaria JA, Levy JP, 2012. Extracting semantic representations from word co-occurrence statistics: stop-lists, stemming, and SVD. Behav Res Methods, 44(3):890-907. ![]() [7]Chen WL, Zhang Y, Zhang M, 2014. Feature embedding for dependency parsing. Proc 25th Int Conf on Computational Linguistics, p.816-826. ![]() [8]Collobert R, Weston J, Bottou L, et al., 2011. Natural language processing (almost) from scratch. J Mach Learn Res, 12:2493-2537. ![]() [9]Grbovic M, Djuric N, Radosavljevic V, et al., 2015. Context-and content-aware embeddings for query rewriting in sponsored search. Proc 38th Int ACM SIGIR Conf on Research and Development in Information Retrieval, p.383-392. ![]() [10]Gummidi SRB, Xie XK, Pedersen TB, 2019. A survey of spatial crowdsourcing. ACM Trans Database Syst, 44(2):8. ![]() [11]Gutmann MU, Hyvärinen A, 2012. Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics. J Mach Learn Res, 13(1):307-361. ![]() [12]Harris ZS, 1954. Distributional structure. Word, 10:146-162. ![]() [13]Hong LJ, Davison BD, 2010. Empirical study of topic modeling in Twitter. Proc 1st Workshop on Social Media Analytics, p.80-88. ![]() [14]Huang Q, Xia X, Xing ZC, et al., 2018. API method recommendation without worrying about the task-API knowledge gap. Proc 33rd ACM/IEEE Int Conf on Automated Software Engineering, p.293-304. ![]() [15]Jia JF, Li GQ, 2021. Learning natural ordering of tags in domain-specific Q&A sites. Front Inform Technol Electron Eng, 22(2):170-184. ![]() [16]Jia JF, Tumanian V, Li GQ, 2020. In favour of or against multi-lingual Q&A sites? Exploring the evidence from user and knowledge perspectives. Behav Inform Technol, p.1-16. ![]() [17]Levy O, Goldberg Y, 2014a. Dependency-based word embeddings. Proc 52nd Annual Meeting of the Association for Computational Linguistics, p.302-308. ![]() [18]Levy O, Goldberg Y, 2014b. Linguistic regularities in sparse and explicit word representations. Proc 18th Conf on Computational Natural Language Learning, p.171-180. ![]() [19]Levy O, Goldberg Y, 2014c. Neural word embedding as implicit matrix factorization. Proc 27th Int Conf on Neural Information Processing Systems, p.2177-2185. ![]() [20]Levy O, Goldberg Y, Dagan I, 2015. Improving distributional similarity with lessons learned from word embeddings. Trans Assoc Comput Ling, 3:211-225. ![]() [21]Li J, Xing ZC, Sun AX, 2019. LinkLive: discovering web learning resources for developers from Q&A discussions. World Wide Web, 22(4):1699-1725. ![]() [22]MacQueen J, 1967. Some methods for classification and analysis of multivariate observations. Proc 5th Berkeley Symp on Mathematical Statistics and Probability, p.281-297. ![]() [23]Mikolov T, Sutskever I, Chen K, et al., 2013a. Distributed representations of words and phrases and their compositionality. Proc 26th Int Conf on Neural Information Processing Systems, p.3111-3119. ![]() [24]Mikolov T, Chen K, Corrado G, et al., 2013b. Efficient estimation of word representations in vector space. https://arxiv.org/abs/1301.3781 ![]() [25]Mitra B, 2015. Exploring session context using distributed representations of queries and reformulations. Proc 38th Int ACM SIGIR Conf on Research and Development in Information Retrieval, p.3-12. ![]() [26]Passos A, Kumar V, McCallum A, 2014. Lexicon infused phrase embeddings for named entity resolution. https://arxiv.org/abs/1404.5367 ![]() [27]Qiu SY, Cui Q, Bian J, et al., 2014. Co-learning of word representations and morpheme representations. Proc 25th Int Conf on Computational Linguistics, p.141-150. ![]() [28]Rand WM, 1971. Objective criteria for the evaluation of clustering methods. J Am Stat Assoc, 66(336):846-850. ![]() [29]Ren XX, Xing ZC, Xia X, et al., 2019. Discovering, explaining and summarizing controversial discussions in community Q&A sites. Proc 34th IEEE/ACM Int Conf on Automated Software Engineering, p.151-162. ![]() [30]Robillard M, Walker R, Zimmermann T, 2010. Recommendation systems for software engineering. IEEE Softw, 27(4):80-86. ![]() [31]Rosen C, Shihab E, 2015. What are mobile developers asking about? A large scale study using Stack OverFlow. Empir Softw Eng, 21(3):1192-1223. ![]() [32]Sillito J, Maurer F, Nasehi SM, et al., 2012. What makes a good code example?: a study of programming Q&A in StackOverflow. Proc IEEE Int Conf on Software Maintenance, p.25-34. ![]() [33]Tian Y, Lo D, Lawall J, 2014a. Automated construction of a software-specific word similarity database. Proc Software Evolution Week-IEEE Conf on Software Maintenance, Reengineering, and Reverse Engineering, p.44-53. ![]() [34]Tian Y, Lo D, Lawall J, 2014b. SEWordSim: software-specific word similarity database. Companion Proc 36th Int Conf on Software Engineering, p.568-571. ![]() [35]Wang SW, Lo D, Jiang LX, 2012. Inferring semantically related software terms and their taxonomy by leveraging collaborative tagging. Proc 28th IEEE Int Conf on Software Maintenance, p.604-607. ![]() [36]Wang SW, Lo D, Jiang LX, 2013. An empirical study on developer interactions in Stack Overflow. Proc 28th Annual ACM Symp on Applied Computing, p.1019-1024. ![]() [37]Xia X, Bao LF, Lo D, et al., 2017. What do developers search for on the web? Empir Softw Eng, 22(6):3149-3185. ![]() [38]Xie XK, Jin P, Yiu ML, et al., 2016. Enabling scalable geographic service sharing with weighted imprecise Voronoi cells. IEEE Trans Knowl Data Eng, 28(2):439-453. ![]() [39]Xie XK, Lin X, Xu JL, et al., 2017. Reverse keyword-based location search. Proc IEEE 33rd Int Conf on Data Engineering, p.375-386. ![]() [40]Xu BW, Xing ZC, Xia X, et al., 2017. AnswerBot: automated generation of answer summary to developers’ technical questions. Proc 32nd IEEE/ACM Int Conf on Automated Software Engineering, p.706-716. ![]() [41]Xu C, Bai YL, Bian J, et al., 2014. RC-NET: a general framework for incorporating knowledge into word representations. Proc 23rd ACM Int Conf on Information and Knowledge Management, p.1219-1228. ![]() [42]Yang JQ, Tan L, 2014. SWordNet: inferring semantically related words from software context. Empir Softw Eng, 19(6):1856-1886. ![]() Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou
310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn Copyright © 2000 - 2025 Journal of Zhejiang University-SCIENCE | ||||||||||||||


ORCID:
Open peer comments: Debate/Discuss/Question/Opinion
<1>