Publishing Service

Polishing & Checking

Frontiers of Information Technology & Electronic Engineering

ISSN 2095-9184 (print), ISSN 2095-9230 (online)

Cohort-based personalized query auto-completion

Abstract: Query auto-completion (QAC) facilitates query formulation by predicting completions for given query prefix inputs. Most web search engines use behavioral signals to customize query completion lists for users. To be effective, such personalized QAC models rely on the access to sufficient context about each user’s interest and intentions. Hence, they often suffer from data sparseness problems. For this reason, we propose the construction and application of cohorts to address context sparsity and to enhance QAC personalization. We build an individual’s interest profile by learning his/her topic preferences through topic models and then aggregate users who share similar profiles. As conventional topic models are unable to automatically learn cohorts, we propose two cohort topic models that handle topic modeling and cohort discovery in the same framework. We present four cohort-based personalized QAC models that employ four different cohort discovery strategies. Our proposals use cohorts’ contextual information together with query frequency to rank completions. We perform extensive experiments on the publicly available AOL query log and compare the ranking effectiveness with that of models that discard cohort contexts. Experimental results suggest that our cohort-based personalized QAC models can solve the sparseness problem and yield significant relevance improvement over competitive baselines.

Key words: Query auto-completion, Cohort-based retrieval, Topic models

Chinese Summary  <19> 基于同类用户的个性化查询词自动推荐方法

摘要:查询词自动推荐(query auto-completion,QAC)通过预测查询词前缀对应的完整补全查询词帮助用户构造查询词。大多互联网搜索引擎利用用户的行为信息为用户提供个性化的查询词自动推荐列表。为提高推荐成功率,个性化的QAC方法需获取大量关于用户搜索兴趣和搜索意图的上下文信息。因此,这些方法通常受制于用户数据的稀疏性问题。本文提出利用同类用户的搜索记录解决用户数据的稀疏性问题,并提升个性化QAC方法的推荐性能。首先,通过主题模型得到用户的主题兴趣,建立每个用户的兴趣肖像,然后将兴趣肖像相似的用户聚集起来建立同类用户群。由于传统主题模型不能自动识别同类用户,提出两个同类用户主题模型,将主题建模与同类用户识别包含在同一个模型框架内。根据不同的同类用户识别方法,提供4个不同的基于同类用户的个性化QAC方法。所提个性化QAC方法通过同类用户的上下文信息和查询词的频率对补全的查询词排序。在公开的AOL查询词数据集上进行大量实验,并与不采用同类用户上下文信息的方法进行排序性能对比。实验结果显示,本文提出的基于同类用户的个性化QAC方法能有效解决用户数据稀疏性问题,并且相对于基准方法能大幅提升排序结果准确性。

关键词组:查询词自动推荐;基于同类用户的信息检索;主题模型


Share this article to: More

Go to Contents

References:

<Show All>

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





DOI:

10.1631/FITEE.1800010

CLC number:

TP311.5

Download Full Text:

Click Here

Downloaded:

1933

Download summary:

<Click Here> 

Downloaded:

1422

Clicked:

4764

Cited:

0

On-line Access:

2019-10-08

Received:

2018-01-05

Revision Accepted:

2018-08-05

Crosschecked:

2019-09-04

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952276; Fax: +86-571-87952331; E-mail: jzus@zju.edu.cn
Copyright © 2000~ Journal of Zhejiang University-SCIENCE