CLC number: TP391
On-line Access: 2019-03-11
Received: 2017-02-13
Revision Accepted: 2017-07-20
Crosschecked: 2019-02-15
Cited: 0
Clicked: 6942
Dan-ping Liao, Yun-tao Qian. Paper evolution graph: multi-view structural retrieval for academic literature[J]. Frontiers of Information Technology & Electronic Engineering, 2019, 20(2): 187-205.
@article{title="Paper evolution graph: multi-view structural retrieval for academic literature",
author="Dan-ping Liao, Yun-tao Qian",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="20",
number="2",
pages="187-205",
year="2019",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.1700105"
}
%0 Journal Article
%T Paper evolution graph: multi-view structural retrieval for academic literature
%A Dan-ping Liao
%A Yun-tao Qian
%J Frontiers of Information Technology & Electronic Engineering
%V 20
%N 2
%P 187-205
%@ 2095-9184
%D 2019
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.1700105
TY - JOUR
T1 - Paper evolution graph: multi-view structural retrieval for academic literature
A1 - Dan-ping Liao
A1 - Yun-tao Qian
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 20
IS - 2
SP - 187
EP - 205
%@ 2095-9184
Y1 - 2019
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.1700105
Abstract: academic literature retrieval concerns about the selection of papers that are most likely to match a user‘s information needs. Most of the retrieval systems are limited to list-output models, in which the retrieval results are isolated from each other. In this paper, we aim to uncover the relationships between the retrieval results and propose a method to build structural retrieval results for academic literature, which we call a paper evolution graph (PEG). The PEG describes the evolution of diverse aspects of input queries through several evolution chains of papers. By using the author, citation, and content information, PEGs can uncover various underlying relationships among the papers and present the evolution of articles from multiple viewpoints. Our system supports three types of input queries: keyword query, single-paper query, and two-paper query. The construction of a PEG consists mainly of three steps. First, the papers are soft-clustered into communities via metagraph factorization, during which the topic distribution of each paper is obtained. Second, topically cohesive evolution chains are extracted from the communities that are relevant to the query. Each chain focuses on one aspect of the query. Finally, the extracted chains are combined to generate a PEG, which fully covers all the topics of the query. Experimental results on a real-world dataset demonstrate that the proposed method can construct meaningful PEGs.
[1]Agrawal R, Gollapudi S, Halverson A, et al., 2009. Diversifying search results. Proc 2nd ACM Int Conf on Web Search and Data Mining, p.5-14.
[2]Ahmed A, Ho Q, Eisenstein J, et al., 2011. Unified analysis of streaming news. Proc 20th Int Conf on World Wide Web, p.267-276.
[3]Aljaber B, Stokes N, Bailey J, et al., 2010. Document clustering of scientific texts using citation contexts. Inform Retriev, 13(2):101-131.
[4]Allan J, Gupta R, Khandelwal V, 2001. Temporal summaries of new topics. Proc 24th Annual Int ACM SIGIR Conf on Research and Development in Information Retrieval, p.10-18.
[5]Bader BW, Kolda TG, 2006. Algorithm 862: Matlab tensor classes for fast algorithm prototyping. ACM Trans Math Softw, 32(4):635-653.
[6]Banerjee A, Basu S, Merugu S, 2007. Multi-way clustering on relation graphs. Proc SIAM Int Conf on Data Mining, p.145-156.
[7]Blei DM, Ng AY, Jordan MI, 2003. Latent textDirichlet allocation. J Mach Learn Res, 3:993-1022.
[8]Bolelli L, Ertekin cS, Giles CL, 2009. Topic and trend detection in text collections using latent Dirichlet allocation. European Conf on Information Retrieval, p.776-780.
[9]Brin S, Page L, 1998. The anatomy of a large-scale hypertextual textWeb search engine. Comput Netw ISDN Syst, 30(1-7):107-117.
[10]Butler D, 2004. Science searches shift up a gear as textGoogle starts scholar engine. Nature, 432(7016):423.
[11]Campbell I, 2000. Interactive evaluation of the ostensive model using a new test collection of images with multiple relevance assessments. Inform Retriev, 2(1):89-114.
[12]Chen H, Karger DR, 2006. Less is more: probabilistic models for retrieving fewer relevant documents. Proc 29th Annual Int ACM SIGIR Conf on Research and Development in Information Retrieval, p.429-436.
[13]Chen P, Xie H, Maslov S, et al., 2007. Finding scientific gems with textGoogle‘s textPageRank algorithm. J Inform, 1(1):8-15.
[14]Garfield E, 1979. Citation Indexing: Its textTheory and textApplication in textScience,text Technology, and textHumanities. Wiley, New York, USA.
[15]Gohr A, Hinneburg A, Schult R, et al., 2009. Topic evolution in a stream of documents. Proc SIAM Int Conf on Data Mining, p.859-872.
[16]He Q, Chen B, Pei J, et al., 2009. Detecting topic evolution in scientific literature: how can citations help? Proc 18th ACM Conf on Information and Knowledge Management, p.957-966.
[17]Jo Y, Hopcroft JE, Lagoze C, 2011. The web of topics: discovering the topology of topic evolution in a corpus. Proc 20th Int Conf on World Wide Web, p.257-266.
[18]Kempe D, Kleinberg J, Tardos É 2003. Maximizing the spread of influence through a social network. Proc 9th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining, p.137-146.
[19]Kleinberg J, 1999. Authoritative sources in a hyperlinked environment. J ACM, 46(5):604-632.
[20]Kleinberg J, 2003. Bursty and hierarchical structure in streams. Data Min Knowl Discov, 7(4):373-397.
[21]Lafferty J, Zhai CX, 2001. Document language models, query models, and risk minimization for information retrieval. Proc 24th Annual Int ACM SIGIR Conf on Research and Development in Information Retrieval, p.111-119.
[22]Lavrenko V, Croft WB, 2001. Relevance based language models. Proc 24th Annual Int ACM SIGIR Conf on Research and Development in Information Retrieval, p.120-127.
[23]Lin YR, Sun JM, Castro P, et al., 2009. textMetaFac: community discovery via relational hypergraph factorization. Proc 15th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining, p.527-536.
[24]Long B, Zhang ZF, Wu XY, et al., 2006. Spectral clustering for multi-type relational data. Proc 23th Int Conf on Machine Learning, p.585-592.
[25]Makkonen J, 2003. Investigations on event evolution in textTDT. Proc Conf of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: Proc HLT-NAACL Student, p.43-48.
[26]Mei QZ, Zhai CX, 2005. Discovering evolutionary theme patterns from text: an exploration of temporal text mining. Proc 11th ACM SIGKDD Int Conf on Knowledge Discovery in Data Mining, p.198-207.
[27]Mei QZ, Liu C, Su H, et al., 2006. A probabilistic approach to spatiotemporal theme pattern mining on weblogs. Proc 15th Int Conf on World Wide Web, p.533-542.
[28]Nallapati R, Feng A, Peng FC, et al., 2004. Event threading within news topics. Proc 13th ACM Int Conf on Information and Knowledge Management, p.446-453.
[29]Narin F, 1976. Evaluative textBibliometrics: the textUse of textPublication and textCitation textAnalysis in the textEvaluation of textScientific textActivity. Computer Horizons, Inc., Washington, DC, USA.
[30]Newman ME, 2001. Scientific collaboration networks. I. network construction and fundamental results. Phys Rev E, 64:016131.
[31]Robertson SE, 1977. The probability ranking principle in textIR. J Doc, 33(4):294-304.
[32]Rosen-Zvi M, Griffiths T, Steyvers M, et al., 2004. The author-topic model for authors and documents. Proc 20th Conf on Uncertainty in Artificial Intelligence, p.487-494.
[33]Salton G, 1971. The Smart Retrieval System—Experiments in Automatic Document Processing. Prentice-Hall, Englewood Cliffs, USA.
[34]Schult R, Spiliopoulou M, 2006. Discovering emerging topics in unlabelled text collections. Proc 10th East European Conf on Advances in Databases and Information Systems, p.353-366.
[35]Shahaf D, Guestrin C, 2010. Connecting the dots between news articles. Proc 16th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining, p.623-632.
[36]Shahaf D, Guestrin C, Horvitz E, 2012. Trains of thought: generating information maps. Proc 21st Int Conf on World Wide Web, p.899-908.
[37]Shen XH, Zhai CX, 2005. Active feedback in ad-hoc information retrieval. Proc 28th Annual Int ACM SIGIR Conf on Research and Development in Information Retrieval, p.59-66.
[38]Small H, 1973. Co-citation in the scientific literature: a new measure of the relationship between two documents. J Am Soc Inform Sci, 24(4):265-269.
[39]Spiliopoulou M, Ntoutsi I, Theodoridis Y, et al., 2006. textMONIC: modeling and monitoring cluster transitions. Proc 12th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining, p.706-711.
[40]Steyvers M, Smyth P, Rosen-Zvi M, et al., 2004. Probabilistic author-topic models for information discovery. Proc 10th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining, p.306-315.
[41]Tang J, Zhang J, Yao LM, et al., 2008. ArnetMiner: extraction and mining of academic social networks. Proc 14th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining, p.990-998.
[42]Yan R, Wan XJ, Otterbacher J, et al., 2011. Evolutionary timeline summarization: a balanced optimization framework via iterative substitution. Proc 34th Int ACM SIGIR Conf on Research and Development in Information Retrieval, p.745-754.
[43]Yu J, Mohan S, Putthividhya D, et al., 2014. Latent Dirichlet allocation based diversified retrieval for e-commerce search. Proc 7th ACM Int Conf on Web Search and Data Mining, p.463-472.
[44]Yu L, Liu C, Zhang ZK, 2015. Multi-linear interactive matrix factorization. Knowl Based Syst, 85:307-315.
[45]Zhou D, Ji X, Zha HY, et al., 2006. Topic evolution and social interactions: how authors effect research. Proc 15th ACM Int Conf on Information and Knowledge Management, p.248-257.
[46]Zhu SH, Yu K, Chi Y, et al., 2007. Combining content and link for classification using matrix factorization. Proc 30th Annual Int ACM SIGIR Conf on Research and Development in Information Retrieval, p.487-494.
Open peer comments: Debate/Discuss/Question/Opinion
<1>