Journal of Zhejiang University

Frontiers of Information Technology & Electronic Engineering 2020 Vol.21 No.3 P.436-447

EncyCatalogRec: catalog recommendation for encyclopedia article completion

Author(s): Wei-ming Lu, Jia-hui Liu, Wei Xu, Peng Wang, Bao-gang Wei
Affiliation(s): 1. College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
Corresponding email(s): luwm@zju.edu.cn
Key Words: Catalog recommendation, Encyclopedia article completion, Product graph, Transductive learning

Share this article to： More <<< Previous Article \|Next Article >>>

Wei-ming Lu, Jia-hui Liu, Wei Xu, Peng Wang, Bao-gang Wei. EncyCatalogRec: catalog recommendation for encyclopedia article completion[J]. Frontiers of Information Technology & Electronic Engineering, 2020, 21(3): 436-447.

@article{title="EncyCatalogRec: catalog recommendation for encyclopedia article completion",
author="Wei-ming Lu, Jia-hui Liu, Wei Xu, Peng Wang, Bao-gang Wei",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="21",
number="3",
pages="436-447",
year="2020",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.1800363"
}

%0 Journal Article
%T EncyCatalogRec: catalog recommendation for encyclopedia article completion
%A Wei-ming Lu
%A Jia-hui Liu
%A Wei Xu
%A Peng Wang
%A Bao-gang Wei
%J Frontiers of Information Technology & Electronic Engineering
%V 21
%N 3
%P 436-447
%@ 2095-9184
%D 2020
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.1800363

TY - JOUR
T1 - EncyCatalogRec: catalog recommendation for encyclopedia article completion
A1 - Wei-ming Lu
A1 - Jia-hui Liu
A1 - Wei Xu
A1 - Peng Wang
A1 - Bao-gang Wei
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 21
IS - 3
SP - 436
EP - 447
%@ 2095-9184
Y1 - 2020
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.1800363

Abstract
Chinese Summary
Academic Network
Reviewer Comment

Abstract: Online encyclopedias such as Wikipedia provide a large and growing number of articles on many topics. However, the content of many articles is still far from complete. In this paper, we propose EncyCatalogRec, a system to help generate a more comprehensive article by recommending catalogs. First, we represent articles and catalog items as embedding vectors, and obtain similar articles via the locality sensitive hashing technology, where the items of these articles are considered as the candidate items. Then a relation graph is built from the articles and the candidate items. This is further transformed into a product graph. So, the recommendation problem is changed to a transductive learning problem in the product graph. Finally, the recommended items are sorted by the learning-to-rank technology. Experimental results demonstrate that our approach achieves state-of-the-art performance on catalog recommendation in both warm- and cold-start scenarios. We have validated our approach by a case study.

EncyCatalogRec：针对百科文章补全的目录推荐

鲁伟明，刘佳卉，徐玮，王鹏，魏宝刚
浙江大学计算机科学与技术学院，中国杭州市，310027

摘要：目前，在线百科（如维基百科等）已提供海量且主题多样的文章。然而，部分文章内容仍不够完善。本文提出EncyCatalogRec，一种能为百科文章推荐相关目录，从而帮助用户更好完善百科内容的系统。首先，将百科文章和目录项表达为内嵌向量，基于局部敏感哈希方法检索得到相关文章，并以这些文章的目录项为候选项；然后，基于检索得到的文章及其目录项构建关系图，进一步转为乘积图；在乘积图上，将目录推荐问题转为直推式学习问题；最后，基于学习排序算法对推荐得到的目录项排序。热启动和冷启动场景实验均证实，本文所提方法性能优于已有方法。最后通过示例验证了所提方法性能。

关键词：目录推荐；百科文章补全；乘积图；直推式学习

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Banerjee S, Mitra P, 2015a. Filling the gaps: improving Wikipedia stubs. Proc ACM Symp on Document Engineering, p.117-120.

[2]Banerjee S, Mitra P, 2015b. WikiKreator: improving Wikipedia stubs automatically. Proc 53^rd> Annual Meeting of the Association for Computational Linguistics and the 7$^rm th$ Int Joint Conf on Natural Language Processing, p.867-877.

[3]Banerjee S, Mitra P, 2016. WikiWrite: generating Wikipedia articles automatically. Proc 25^th> Int Joint Conf on Artificial Intelligence, p.2740-2746.

[4]Bizer C, Lehmann J, Kobilarov G, et al., 2009. DBpedia—a crystallization point for the web of data. it J Web Semant, 7(3):154-165.

[5]Datar M, Immorlica N, Indyk P, et al., 2004. Locality-sensitive hashing scheme based on $p$-stable distributions. Proc 20^th> Annual Symp on Computational Geometry, p.253-262.

[6]Fetahu B, Markert K, Anand A, 2015. Automated news suggestions for populating Wikipedia entity pages. Proc 24^th> ACM Int Conf on Information and Knowledge Management, p.323-332.

[7]Gambhir M, Gupta V, 2017. Recent automatic text summarization techniques: a survey. it Artif Intell Rev, 47(1):1-66.

[8]Haveliwala TH, 2002. Topic-sensitive PageRank. Proc 11^th> Int Conf on World Wide Web, p.517-526.

[9]He XN, Liao LZ, Zhang HW, et al., 2017. Neural collaborative filtering. Proc 26^th> Int Conf on World Wide Web, p.173-182.

[10]Hoffart J, Suchanek FM, Berberich K, et al., 2013. YAGO2: a spatially and temporally enhanced knowledge base from Wikipedia. it Artif Intell, 194:28-61.

[11]Joachims T, 2002. Optimizing search engines using clickthrough data. Proc 8^th> ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining, p.133-142.

[12]Joachims T, 2006. Training linear SVMs in linear time. Proc 12^th> ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining, p.217-226.

[13]Koren Y, Bell R, Volinsky C, 2009. Matrix factorization techniques for recommender systems. it Computer, 42(8):30-37.

[14]Le QV, Mikolov T, 2014. Distributed representations of sentences and documents. Proc 31^th> Int Conf on Machine Learning, p.1188-1196.

[15]Liu HX, Yang YM, 2015. Bipartite edge prediction via transductive learning over product graphs. Proc 32^nd> Int Conf on Machine Learning, p.1880-1888.

[16]Luo X, Zhou MC, Xia YN, et al., 2014. An efficient non-negative matrix-factorization-based approach to collaborative filtering for recommender systems. it IEEE Trans Ind Inform, 10(2):1273-1284.

[17]Mikolov T, Sutskever I, Chen K, et al., 2013a. Distributed representations of words and phrases and their compositionality. Proc 26^th> Int Conf on Neural Information Processing Systems, p.3111-3119.

[18]Mikolov T, Chen K, Corrado G, et al., 2013b. Efficient estimation of word representations in vector space. https://arxiv.org/abs/1301.3781

[19]Reinanda R, Meij E, de Rijke M, 2015. Mining, ranking and recommending entity aspects. Proc 38^th> Int ACM SIGIR Conf on Research and Development in Information Retrieval, p.263-272.

[20]Sauper C, Barzilay R, 2009. Automatically generating Wikipedia articles: a structure-aware approach. Proc 47^th> Annual Meeting of the ACL and the 4^th> Int Joint Conf on Natural Language Processing of the AFNLP, p.208-216.

[21]Strube M, Ponzetto SP, 2006. WikiRelate! Computing semantic relatedness using Wikipedia. Proc 21^st> National Conf on Artificial Intelligence, p.1419-1424.

[22]Suchanek FM, Kasneci G, Weikum G, 2007. YAGO: a core of semantic knowledge. Proc 16^th> Int Conf on World Wide Web, p.697-706.

[23]Tanaka S, Okazaki N, Ishizuka M, 2010. Learning web query patterns for imitating Wikipedia articles. Proc 23^rd> Int Conf on Computational Linguistics, p.1229-1237.

[24]Wagstaff KL, Riloff E, Lanza NL, et al., 2016. Creating a Mars target encyclopedia by extracting information from the planetary science literature. AAAI Workshop on Knowledge Extraction from Text, p.532-536.

[25]Wulczyn E, West R, Zia L, et al., 2016. Growing Wikipedia across languages via recommendation. Proc 25^th> Int Conf on World Wide Web, p.975-985.

[26]Zhao Y, Karypis G, 2002. Evaluation of hierarchical clustering algorithms for document datasets. Proc 11^th> Int Conf on Information and Knowledge Management, p.515-524.

[27]Zhao Y, Karypis G, Fayyad U, 2005. Hierarchical clustering algorithms for document datasets. it Data Min Knowl Discov, 10(2):141-168.

Open peer comments: Debate/Discuss/Question/Opinion

<1>