Journal of Zhejiang University

Frontiers of Information Technology & Electronic Engineering 2019 Vol.20 No.2 P.187-205

Paper evolution graph: multi-view structural retrieval for academic literature

Author(s): Dan-ping Liao, Yun-tao Qian
Affiliation(s): 1. College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
Corresponding email(s): ytqian@zju.edu.cn
Key Words: Paper evolution graph, Academic literature retrieval, Metagraph factorization, Topic coherence

Share this article to： More <<< Previous Article \|Next Article >>>

Dan-ping Liao, Yun-tao Qian. Paper evolution graph: multi-view structural retrieval for academic literature[J]. Frontiers of Information Technology & Electronic Engineering, 2019, 20(2): 187-205.

@article{title="Paper evolution graph: multi-view structural retrieval for academic literature",
author="Dan-ping Liao, Yun-tao Qian",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="20",
number="2",
pages="187-205",
year="2019",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.1700105"
}

%0 Journal Article
%T Paper evolution graph: multi-view structural retrieval for academic literature
%A Dan-ping Liao
%A Yun-tao Qian
%J Frontiers of Information Technology & Electronic Engineering
%V 20
%N 2
%P 187-205
%@ 2095-9184
%D 2019
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.1700105

TY - JOUR
T1 - Paper evolution graph: multi-view structural retrieval for academic literature
A1 - Dan-ping Liao
A1 - Yun-tao Qian
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 20
IS - 2
SP - 187
EP - 205
%@ 2095-9184
Y1 - 2019
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.1700105

Abstract
Chinese Summary
Academic Network
Reviewer Comment

Abstract: academic literature retrieval concerns about the selection of papers that are most likely to match a user‘s information needs. Most of the retrieval systems are limited to list-output models, in which the retrieval results are isolated from each other. In this paper, we aim to uncover the relationships between the retrieval results and propose a method to build structural retrieval results for academic literature, which we call a paper evolution graph (PEG). The PEG describes the evolution of diverse aspects of input queries through several evolution chains of papers. By using the author, citation, and content information, PEGs can uncover various underlying relationships among the papers and present the evolution of articles from multiple viewpoints. Our system supports three types of input queries: keyword query, single-paper query, and two-paper query. The construction of a PEG consists mainly of three steps. First, the papers are soft-clustered into communities via metagraph factorization, during which the topic distribution of each paper is obtained. Second, topically cohesive evolution chains are extracted from the communities that are relevant to the query. Each chain focuses on one aspect of the query. Finally, the extracted chains are combined to generate a PEG, which fully covers all the topics of the query. Experimental results on a real-world dataset demonstrate that the proposed method can construct meaningful PEGs.

论文演化图：学术文献多视角结构化检索

摘要：学术文献检索关注于选取最可能符合用户信息需求的论文。目前大部分检索系统局限于输出相关文献列表，而这些检出文献相互独立。本文旨在揭示检索结果的相互关系。提出一种为学术文献建立结构化检索结果的方法，称为论文演化图（PEG）。PEG采用多个演化链描述查询输入信息在不同主题方向的演化情况。通过论文作者、参考文献引用、论文内容信息这3个视角，PEG能够发现文献之间各种潜在关系，并多视角展示文献演化过程。该文献检索系统支持关键词、单篇论文、双论文3种查询方式。PEG构造主要有3个步骤：首先，采用元图分解法把文献软聚合为多个群落，获取每篇论文的主题分布；其次，从与查询相关的文献群落中提取主题连贯性演化链。每条演化链反映查询信息的某一视角；最后，提取的演化链组合形成论文演化图，可以覆盖查询涉及的所有主题。基于真实文献数据库的实验结果表明，该方法能够建立对用户有意义的论文演化图。

关键词：论文演化图；学术文献检索；元图分解；主题连贯性

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Agrawal R, Gollapudi S, Halverson A, et al., 2009. Diversifying search results. Proc 2^nd ACM Int Conf on Web Search and Data Mining, p.5-14.

[2]Ahmed A, Ho Q, Eisenstein J, et al., 2011. Unified analysis of streaming news. Proc 20^th Int Conf on World Wide Web, p.267-276.

[3]Aljaber B, Stokes N, Bailey J, et al., 2010. Document clustering of scientific texts using citation contexts. Inform Retriev, 13(2):101-131.

[4]Allan J, Gupta R, Khandelwal V, 2001. Temporal summaries of new topics. Proc 24^th Annual Int ACM SIGIR Conf on Research and Development in Information Retrieval, p.10-18.

[5]Bader BW, Kolda TG, 2006. Algorithm 862: Matlab tensor classes for fast algorithm prototyping. ACM Trans Math Softw, 32(4):635-653.

[6]Banerjee A, Basu S, Merugu S, 2007. Multi-way clustering on relation graphs. Proc SIAM Int Conf on Data Mining, p.145-156.

[7]Blei DM, Ng AY, Jordan MI, 2003. Latent textDirichlet allocation. J Mach Learn Res, 3:993-1022.

[8]Bolelli L, Ertekin cS, Giles CL, 2009. Topic and trend detection in text collections using latent Dirichlet allocation. European Conf on Information Retrieval, p.776-780.

[9]Brin S, Page L, 1998. The anatomy of a large-scale hypertextual textWeb search engine. Comput Netw ISDN Syst, 30(1-7):107-117.

[10]Butler D, 2004. Science searches shift up a gear as textGoogle starts scholar engine. Nature, 432(7016):423.

[11]Campbell I, 2000. Interactive evaluation of the ostensive model using a new test collection of images with multiple relevance assessments. Inform Retriev, 2(1):89-114.

[12]Chen H, Karger DR, 2006. Less is more: probabilistic models for retrieving fewer relevant documents. Proc 29^th Annual Int ACM SIGIR Conf on Research and Development in Information Retrieval, p.429-436.

[13]Chen P, Xie H, Maslov S, et al., 2007. Finding scientific gems with textGoogle‘s textPageRank algorithm. J Inform, 1(1):8-15.

[14]Garfield E, 1979. Citation Indexing: Its textTheory and textApplication in textScience,text Technology, and textHumanities. Wiley, New York, USA.

[15]Gohr A, Hinneburg A, Schult R, et al., 2009. Topic evolution in a stream of documents. Proc SIAM Int Conf on Data Mining, p.859-872.

[16]He Q, Chen B, Pei J, et al., 2009. Detecting topic evolution in scientific literature: how can citations help? Proc 18^th ACM Conf on Information and Knowledge Management, p.957-966.

[17]Jo Y, Hopcroft JE, Lagoze C, 2011. The web of topics: discovering the topology of topic evolution in a corpus. Proc 20^th Int Conf on World Wide Web, p.257-266.

[18]Kempe D, Kleinberg J, Tardos É 2003. Maximizing the spread of influence through a social network. Proc 9^th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining, p.137-146.

[19]Kleinberg J, 1999. Authoritative sources in a hyperlinked environment. J ACM, 46(5):604-632.

[20]Kleinberg J, 2003. Bursty and hierarchical structure in streams. Data Min Knowl Discov, 7(4):373-397.

[21]Lafferty J, Zhai CX, 2001. Document language models, query models, and risk minimization for information retrieval. Proc 24^th Annual Int ACM SIGIR Conf on Research and Development in Information Retrieval, p.111-119.

[22]Lavrenko V, Croft WB, 2001. Relevance based language models. Proc 24^th Annual Int ACM SIGIR Conf on Research and Development in Information Retrieval, p.120-127.

[23]Lin YR, Sun JM, Castro P, et al., 2009. textMetaFac: community discovery via relational hypergraph factorization. Proc 15^th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining, p.527-536.

[24]Long B, Zhang ZF, Wu XY, et al., 2006. Spectral clustering for multi-type relational data. Proc 23^th Int Conf on Machine Learning, p.585-592.

[25]Makkonen J, 2003. Investigations on event evolution in textTDT. Proc Conf of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: Proc HLT-NAACL Student, p.43-48.

[26]Mei QZ, Zhai CX, 2005. Discovering evolutionary theme patterns from text: an exploration of temporal text mining. Proc 11^th ACM SIGKDD Int Conf on Knowledge Discovery in Data Mining, p.198-207.

[27]Mei QZ, Liu C, Su H, et al., 2006. A probabilistic approach to spatiotemporal theme pattern mining on weblogs. Proc 15^th Int Conf on World Wide Web, p.533-542.

[28]Nallapati R, Feng A, Peng FC, et al., 2004. Event threading within news topics. Proc 13^th ACM Int Conf on Information and Knowledge Management, p.446-453.

[29]Narin F, 1976. Evaluative textBibliometrics: the textUse of textPublication and textCitation textAnalysis in the textEvaluation of textScientific textActivity. Computer Horizons, Inc., Washington, DC, USA.

[30]Newman ME, 2001. Scientific collaboration networks. I. network construction and fundamental results. Phys Rev E, 64:016131.

[31]Robertson SE, 1977. The probability ranking principle in textIR. J Doc, 33(4):294-304.

[32]Rosen-Zvi M, Griffiths T, Steyvers M, et al., 2004. The author-topic model for authors and documents. Proc 20^th Conf on Uncertainty in Artificial Intelligence, p.487-494.

[33]Salton G, 1971. The Smart Retrieval System—Experiments in Automatic Document Processing. Prentice-Hall, Englewood Cliffs, USA.

[34]Schult R, Spiliopoulou M, 2006. Discovering emerging topics in unlabelled text collections. Proc 10^th East European Conf on Advances in Databases and Information Systems, p.353-366.

[35]Shahaf D, Guestrin C, 2010. Connecting the dots between news articles. Proc 16^th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining, p.623-632.

[36]Shahaf D, Guestrin C, Horvitz E, 2012. Trains of thought: generating information maps. Proc 21^st Int Conf on World Wide Web, p.899-908.

[37]Shen XH, Zhai CX, 2005. Active feedback in ad-hoc information retrieval. Proc 28^th Annual Int ACM SIGIR Conf on Research and Development in Information Retrieval, p.59-66.

[38]Small H, 1973. Co-citation in the scientific literature: a new measure of the relationship between two documents. J Am Soc Inform Sci, 24(4):265-269.

[39]Spiliopoulou M, Ntoutsi I, Theodoridis Y, et al., 2006. textMONIC: modeling and monitoring cluster transitions. Proc 12^th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining, p.706-711.

[40]Steyvers M, Smyth P, Rosen-Zvi M, et al., 2004. Probabilistic author-topic models for information discovery. Proc 10^th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining, p.306-315.

[41]Tang J, Zhang J, Yao LM, et al., 2008. ArnetMiner: extraction and mining of academic social networks. Proc 14^th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining, p.990-998.

[42]Yan R, Wan XJ, Otterbacher J, et al., 2011. Evolutionary timeline summarization: a balanced optimization framework via iterative substitution. Proc 34^th Int ACM SIGIR Conf on Research and Development in Information Retrieval, p.745-754.

[43]Yu J, Mohan S, Putthividhya D, et al., 2014. Latent Dirichlet allocation based diversified retrieval for e-commerce search. Proc 7^th ACM Int Conf on Web Search and Data Mining, p.463-472.

[44]Yu L, Liu C, Zhang ZK, 2015. Multi-linear interactive matrix factorization. Knowl Based Syst, 85:307-315.

[45]Zhou D, Ji X, Zha HY, et al., 2006. Topic evolution and social interactions: how authors effect research. Proc 15^th ACM Int Conf on Information and Knowledge Management, p.248-257.

[46]Zhu SH, Yu K, Chi Y, et al., 2007. Combining content and link for classification using matrix factorization. Proc 30^th Annual Int ACM SIGIR Conf on Research and Development in Information Retrieval, p.487-494.

Open peer comments: Debate/Discuss/Question/Opinion

<1>