JZUS - Journal of Zhejiang University SCIENCE

Journal of Zhejiang University SCIENCE C

ISSN 1869-1951(Print), 1869-196x(Online), Monthly

2012 Vol.13 No.9 P.649-659

Short text classification based on strong feature thesaurus

Bing-kun Wang, Yong-feng Huang, Wan-xia Yang, Xing Li

Information Cognitive and Intelligent System Research Institute, Department of Electronic and Engineering, Tsinghua University, Beijing 100084, China; Information Technology National Laboratory, Tsinghua University, Beijing 100084, China

Wangbingkun77@yahoo.com.cn, wbk10@mails.tsinghua.edu.cn

Abstract: Data sparseness, the evident characteristic of short text, has always been regarded as the main cause of the low accuracy in the classification of short texts using statistical methods. Intensive research has been conducted in this area during the past decade. However, most researchers failed to notice that ignoring the semantic importance of certain feature terms might also contribute to low classification accuracy. In this paper we present a new method to tackle the problem by building a strong feature thesaurus (SFT) based on latent Dirichlet allocation (LDA) and information gain (IG) models. By giving larger weights to feature terms in SFT, the classification accuracy can be improved. Specifically, our method appeared to be more effective with more detailed classification. Experiments in two short text datasets demonstrate that our approach achieved improvement compared with the state-of-the-art methods including support vector machine (SVM) and Naïve Bayes Multinomial.

Key words: Short text, Classification, Data sparseness, Semantic, Strong feature thesaurus (SFT), Latent Dirichlet allocation (LDA)

Share this article to： More

Go to Contents

Recommended Papers Related to this topic:

<HIDE>

[1]Zhang, Y.T., Gong, L., Wang, Y.C., 2005. An improved TF-IDF approach for text classification. J. Zhejiang Univ.-Sci., 6A(1):49-55.

References:

Open peer comments: Debate/Discuss/Question/Opinion

<1>

DOI:

10.1631/jzus.C1100373

CLC number:

TP391.4

Download Full Text:

Click Here

Downloaded:

7267

Clicked:

11026

Cited:

On-line Access:

2024-08-27

Received:

2023-10-17

Revision Accepted:

2024-05-08

Crosschecked:

2012-08-03

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952276; Fax: +86-571-87952331; E-mail: jzus@zju.edu.cn
Copyright © 2000~ Journal of Zhejiang University-SCIENCE

CONTENTS

INSTR. FOR AUTHOR

FOR REVIEWER

ABOUT JZUS