Publishing Service

Polishing & Checking

Frontiers of Information Technology & Electronic Engineering

ISSN 2095-9184 (print), ISSN 2095-9230 (online)

Semantic composition of distributed representations for query subtopic mining

Abstract: Inferring query intent is significant in information retrieval tasks. Query subtopic mining aims to find possible subtopics for a given query to represent potential intents. Subtopic mining is challenging due to the nature of short queries. Learning distributed representations or sequences of words has been developed recently and quickly, making great impacts on many fields. It is still not clear whether distributed representations are effective in alleviating the challenges of query subtopic mining. In this paper, we exploit and compare the main semantic composition of distributed representations for query subtopic mining. Specifically, we focus on two types of distributed representations: paragraph vector which represents word sequences with an arbitrary length directly, and word vector composition. We thoroughly investigate the impacts of semantic composition strategies and the types of data for learning distributed representations. Experiments were conducted on a public dataset offered by the National Institute of Informatics Testbeds and Community for Information Access Research. The empirical results show that distributed semantic representations can achieve outstanding performance for query subtopic mining, compared with traditional semantic representations. More insights are reported as well.

Key words: Subtopic mining; Query intent, Distributed representation, Semantic composition

Chinese Summary  <24> 基于分布式表示语义组合的查询子主题挖掘

摘要:推断查询意图对于信息检索具有重要意义。查询子主题挖掘旨在找到可能的子主题,用于表示给定查询的潜在意图。由于查询较短,子主题挖掘具有挑战性。学习词或句子分布式表示推动和影响了很多领域的发展。然而,没有清晰的结论表明该分布式表示是否有助于应对查询子主题挖掘面临的挑战。提出并比较利用分布式表示的语义组合进行查询子主题挖掘。采用两种分布式表示策略:能学习任意长度文本分布式表示的段落向量(paragraph vector)以及词向量的语义组合。探索了语义组合策略和数据类型对查询表示的影响。在国家信息学研究所信息获取研究试验平台和社区(National Institute of InformaticsTestbeds and Community for Information Access Research,NTCIR)提供的公开数据集上的实验结果表明,与传统语义表示相比,分布式语义表示能获得更优查询子主题挖掘性能。文中做了更多深入探讨。

关键词组:查询子主题挖掘;查询意图;分布式表示;语义组合


Share this article to: More

Go to Contents

References:

<Show All>

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





DOI:

10.1631/FITEE.1601476

CLC number:

TP391.3

Download Full Text:

Click Here

Downloaded:

2339

Download summary:

<Click Here> 

Downloaded:

1602

Clicked:

6680

Cited:

0

On-line Access:

2024-08-27

Received:

2023-10-17

Revision Accepted:

2024-05-08

Crosschecked:

2018-11-12

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952276; Fax: +86-571-87952331; E-mail: jzus@zju.edu.cn
Copyright © 2000~ Journal of Zhejiang University-SCIENCE