Publishing Service

Polishing & Checking

Frontiers of Information Technology & Electronic Engineering

ISSN 2095-9184 (print), ISSN 2095-9230 (online)

Supervised topic models with weighted words: multi-label document classification

Abstract: Supervised topic modeling algorithms have been successfully applied to multi-label document classification tasks. Representative models include labeled latent Dirichlet allocation (L-LDA) and dependency-LDA. However, these models neglect the class frequency information of words (i.e., the number of classes where a word has occurred in the training data), which is significant for classification. To address this, we propose a method, namely the class frequency weight (CF-weight), to weight words by considering the class frequency knowledge. This CF-weight is based on the intuition that a word with higher (lower) class frequency will be less (more) discriminative. In this study, the CF-weight is used to improve L-LDA and dependency-LDA. A number of experiments have been conducted on real-world multi-label datasets. Experimental results demonstrate that CF-weight based algorithms are competitive with the existing supervised topic models.

Key words: Supervised topic model, Multi-label classification, Class frequency, Labeled latent Dirichlet allocation (L-LDA), Dependency-LDA

Chinese Summary  <21> 词加权有监督主题模型:多标签文本分类

摘要:有监督主题模型已成功应用于多标签文本分类任务。代表性模型包括有监督隐含狄利克雷分配模型(labeled latent Dirichlet allocation, L-LDA)和判别隐含狄利克雷分配模型(dependency-LDA)。这些已有模型忽略单词类别频率信息,即训练集中单词出现的类别数量,对分类任务的影响。对此引入类别频率信息,提出一个类别频率词权重方法(class frequency weight, CF-weight)。CF-weight方法基于如下假设:具有较高(或较低)类别频率的单词在分类问题中具有较低(或较高)判别力。将CF-weight方法应用于L-LDA和dependency-LDA模型。实验结果表明,相比传统有监督主题模型,基于CF-weight的模型在多标签分类性能上具有优势。

关键词组:有监督主题模型;多标签分类;类别频率;有监督隐含狄利克雷分配模型;判别隐含狄利克雷分配模型


Share this article to: More

Go to Contents

References:

<Show All>

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





DOI:

10.1631/FITEE.1601668

CLC number:

TP391

Download Full Text:

Click Here

Downloaded:

2050

Download summary:

<Click Here> 

Downloaded:

1517

Clicked:

6464

Cited:

0

On-line Access:

2018-06-07

Received:

2016-10-26

Revision Accepted:

2017-01-03

Crosschecked:

2018-04-03

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952276; Fax: +86-571-87952331; E-mail: jzus@zju.edu.cn
Copyright © 2000~ Journal of Zhejiang University-SCIENCE