Full Text:   <1055>

Summary:  <870>

CLC number: TP182

On-line Access: 2021-02-01

Received: 2019-11-24

Revision Accepted: 2020-02-12

Crosschecked: 2020-08-19

Cited: 0

Clicked: 2375

Citations:  Bibtex RefMan EndNote GB/T7714


Junfang Jia


Guoqiang Li


-   Go to

Article info.
Open peer comments

Frontiers of Information Technology & Electronic Engineering  2021 Vol.22 No.2 P.170-184


Learning natural ordering of tags in domain-specific Q&A sites

Author(s):  Junfang Jia, Guoqiang Li

Affiliation(s):  School of Computer and Network Engineering, Shanxi Datong University, Datong 037009, China; more

Corresponding email(s):   jiajunfang816@163.com, li.g@sjtu.edu.cn

Key Words:  Question and answering (Q&, A) sites, Tagging, Natural order, Skip gram

Junfang Jia, Guoqiang Li. Learning natural ordering of tags in domain-specific Q&A sites[J]. Frontiers of Information Technology & Electronic Engineering, 2021, 22(2): 170-184.

@article{title="Learning natural ordering of tags in domain-specific Q&A sites",
author="Junfang Jia, Guoqiang Li",
journal="Frontiers of Information Technology & Electronic Engineering",
publisher="Zhejiang University Press & Springer",

%0 Journal Article
%T Learning natural ordering of tags in domain-specific Q&A sites
%A Junfang Jia
%A Guoqiang Li
%J Frontiers of Information Technology & Electronic Engineering
%V 22
%N 2
%P 170-184
%@ 2095-9184
%D 2021
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.1900645

T1 - Learning natural ordering of tags in domain-specific Q&A sites
A1 - Junfang Jia
A1 - Guoqiang Li
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 22
IS - 2
SP - 170
EP - 184
%@ 2095-9184
Y1 - 2021
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.1900645

tagging is a defining characteristic of Web 2.0. It allows users of social computing systems (e.g., question and answering (Q&;a) sites) to use free terms to annotate content. However, is tagging really a free action? Existing work has shown that users can develop implicit consensus about what tags best describe the content in an online community. However, there has been no work studying the regularities in how users order tags during tagging. In this paper, we focus on the natural ordering of tags in domain-specific Q&A sites. We study tag sequences of millions of questions in four Q&A sites, i.e., CodeProject, SegmentFault, Biostars, and CareerCup. Our results show that users of these Q&A sites can develop implicit consensus about in which order they should assign tags to questions. We study the relationships between tags that can explain the emergence of natural ordering of tags. Our study opens the path to improve existing tag recommendation and Q&A site navigation by leveraging the natural ordering of tags.



摘要:标注是Web 2.0的一个重要特征。它使得社会计算系统(如问答网站)的用户们可以自由地标记内容。然而,标注真的是自由不受限的吗?现有工作表明,用户们常常可以隐性地就哪种标签最能描述在线社区的内容达成共识。然而,目前还没有针对用户在标注过程中对标签排序的规律性开展研究。本文专注于研究特定领域问答网站中的标签自然排序,并对CodeProject,SegmentFault,Biostars以及CareerCup 4个问答网站上数以百万计的问题中的标签序列进行研究。结果表明,这些问答网站的用户可以就问题标签的排序达成隐性共识。研究了标签之间的关系,这些关系可以解释标签自然顺序的出现。该研究为利用标签的自然顺序提升现有标签推荐以及问答站点导航提供了可能。

关键词:问答网站;标注;自然顺序;Skip gram

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article


[1]Abate ST, Besacier L, Seng S, 2010. Boosting N-gram coverage for unsegmented languages using multiple text segmentation approach. Proc 1st Workshop on South and Southeast Asian Natural Language, p.1-7.

[2]Allamanis M, Barr ET, Bird C, et al., 2014. Learning natural coding conventions. Proc 22nd ACM SIGSOFT Int Symp on Foundations of Software Engineering, p.281-293.

[3]Belém F, Martins E, Pontes T, et al., 2011. Associative tag recommendation exploiting multiple textual features. Proc 34th Int ACM SIGIR Conf on Research and Development in Information Retrieval, p.1033-1042.

[4]Bird S, Boguraev B, Kay M, et al., 1997. Survey of the State of the Art in Human Language Technology. Cambridge University Press, USA.

[5]Cattuto C, Loreto V, Pietronero L, 2007. Semiotic dynamics and collaborative tagging. PNAS, 104(5):1461-1464.

[6]Chen SF, Goodman J, 1996. An empirical study of smoothing techniques for language modeling. Proc 34th Annual Meeting on Association for Computational Linguistics, p.310-318.

[7]Chi EH, Mytkowicz T, 2008. Understanding the efficiency of social tagging systems using information theory. Proc 19th ACM Conf on Hypertext and Hypermedia, p.81-88.

[8]Feng W, Wang JY, 2012. Incorporating heterogeneous information for personalized tag recommendation in social tagging systems. Proc 18th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining, p.1276-1284.

[9]Fu WT, Kannampallil T, Kang RG, et al., 2010. Semantic imitation in social tagging. ACM Trans Comput-Human Interact, Article 12.

[10]Gemmell J, Shepitsen A, Mobasher B, et al., 2008. Personalizing navigation in folksonomies using hierarchical tag clustering. Proc 10th Int Conf on Data Warehousing and Knowledge, p.196-205.

[11]Golder SA, Huberman BA, 2006. Usage patterns of collaborative tagging systems. J Inform Sci, 32(2):198-208.

[12]Goodman JT, 2001. A bit of progress in language modeling. Comput Speech Lang, 15(4):403-434.

[13]Gummidi SRB, Xie XK, Pedersen TB, 2019. A survey of spatial crowdsourcing. ACM Trans Database Syst, 44(2):1-46.

[14]Guthrie D, Allison B, Liu W, et al., 2006. A closer look at skip-gram modelling. Proc 5th Int Conf on Language Resources and Evaluation, p.1-4.

[15]Halpin H, Robu V, Shepherd H, 2007. The complex dynamics of collaborative tagging. Proc 16th Int Conf on World Wide Web, p.211-220.

[16]Heckner M, Heilemann M, Wolff C, 2009. Personal information management vs. resource sharing: towards a model of information behaviour in social tagging systems. Proc 3rd Int AAAI Conf on Weblogs and Social Media, p.42-49.

[17]Heymann P, Garcia-Molina H, 2006. Collaborative Creation of Communal Hierarchical Taxonomies in Social Tagging Systems. InfoLab Technical Report, Stanford.

[18]Heymann P, Koutrika G, Garcia-Molina H, 2008. Can social bookmarking improve web search? Proc Int Conf on Web Search and Data Mining, p.195-206.

[19]Hindle A, Barr ET, Su ZD, et al., 2012. On the naturalness of software. Proc 34th Int Conf on Software Engineering, p.837-847.

[20]Körner C, Kern R, Grahsl HP, et al., 2010. Of categorizers and describers: an evaluation of quantitative measures for tagging motivation. Proc 21st ACM Conf on Hypertext and Hypermedia, p.157-166.

[21]Levenshtein VI, 1966. Binary codes capable of correcting deletions, insertions, and reversals. Sov Phys Dokl, 10(8):707-710.

[22]Ponte JM, Croft WB, 1998. A language modeling approach to information retrieval. Proc 21st Annual Int ACM SIGIR Conf on Research and Development in Information Retrieval, p.275-281.

[23]Robu V, Halpin H, Shepherd H, 2009. Emergence of consensus and shared vocabularies in collaborative tagging systems. ACM Trans Web, 3(4):14.

[24]Rosenfeld R, 1994. A hybrid approach to adaptive statistical language modeling. Proc Workshop on Human Language Technology, p.76-81.

[25]Rosenfeld R, 1995. Optimizing lexical and N-gram coverage via judicious use of linguistic data. Proc European Conf on Speech Technology, p.1763-1766.

[26]Schenkel R, Crecelius T, Kacimi M, et al., 2008. Efficient top-k querying over social-tagging networks. Proc 31st Annual Int ACM SIGIR Conf on Research and Development in Information Retrieval, p.523-530.

[27]Schmitz C, Hotho A, Jäschke R, et al., 2006. Mining association rules in folksonomies. In: Batagelj V, Bock HH, Ferligoj A, et al. (Eds.), Data Science and Classification. Springer, Berlin, p.261-270.

[28]Sigurbjörnsson B, van Zwol R, 2008. Flickr tag recommendation based on collective knowledge. Proc 17th Int Conf on World Wide Web, p.327-336.

[29]Siu M, Ostendorf M, 2000. Variable N-grams and extensions for conversational speech language modeling. IEEE Trans Speech Audio Process, 8(1):63-75.

[30]Song Y, Zhuang ZM, Li HJ, et al., 2008. Real-time automatic tag recommendation. Proc 31st Annual Int ACM SIGIR Conf on Research and Development in Information Retrieval, p.515-522.

[31]Storey MA, Cheng LT, Bull I, et al., 2006. Waypointing and social tagging to support program navigation. CHI Extended Abstracts on Human Factors in Computing Systems, p.1367-1372.

[32]Strohmaier M, Körner C, Kern R, 2010. Why do users tag? Detecting users’ motivation for tagging in social tagging systems. Proc 4th Int AAAI Conf on Weblogs and Social Media, p.23-26.

[33]Thom-Santelli J, Muller MJ, Millen DR, 2008. Social tagging roles: publishers, evangelists, leaders. Proc SIGCHI Conf on Human Factors in Computing Systems, p.1041-1044.

[34]Tuarob S, Pouchard LC, Giles CL, 2013. Automatic tag recommendation for metadata annotation using probabilistic topic modeling. Proc 13th ACM/IEEE-CS joint Conf on Digital Libraries, p.239-248.

[35]Wagner C, Singer P, Strohmaier M, et al., 2014. Semantic stability in social tagging streams. Proc 23rd Int Conf on World Wide Web, p.735-746.

[36]Wang SW, Lo D, Vasilescu B, et al., 2014. EnTagRec: an enhanced tag recommendation system for software information sites. Proc IEEE Int Conf on Software Maintenance and Evolution, p.291-300.

[37]Wattenberg M, Viégas FB, 2008. The word tree, an interactive visual concordance. IEEE Trans Vis Comput Graph, 14(6):1221-1228.

[38]Xia X, Lo D, Wang XY, et al., 2013. Tag recommendation in software information sites. Proc 10th Working Conf on Mining Software Repositories, p.287-296.

[39]Xie XK, Jin PQ, Yiu ML, et al., 2016. Enabling scalable geographic service sharing with weighted imprecise Voronoi cells. IEEE Trans Knowl Data Eng, 28(2):439-453.

[40]Xie XK, Lin X, Xu JL, et al., 2017. Reverse keyword-based location search. Proc IEEE 33rd Int Conf on Data Engineering, p.403-434.

[41]Zubiaga A, 2012. Enhancing navigation on Wikipedia with social tags. https://arxiv.org/abs/1202.5469v1

Open peer comments: Debate/Discuss/Question/Opinion


Please provide your name, email address and a comment

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - 2022 Journal of Zhejiang University-SCIENCE