Publishing Service

Polishing & Checking

Journal of Zhejiang University SCIENCE A

ISSN 1673-565X(Print), 1862-1775(Online), Monthly

Hierarchical topic modeling with nested hierarchical Dirichlet process

Abstract: This paper deals with the statistical modeling of latent topic hierarchies in text corpora. The height of the topic tree is assumed as fixed, while the number of topics on each level as unknown a priori and to be inferred from data. Taking a nonparametric Bayesian approach to this problem, we propose a new probabilistic generative model based on the nested hierarchical Dirichlet process (nHDP) and present a Markov chain Monte Carlo sampling algorithm for the inference of the topic tree structure as well as the word distribution of each topic and topic distribution of each document. Our theoretical analysis and experiment results show that this model can produce a more compact hierarchical topic structure and captures more fine-grained topic relationships compared to the hierarchical latent Dirichlet allocation model.

Key words: Topic modeling, Natural language processing, Chinese restaurant process, Hierarchical Dirichlet process, Markov chain Monte Carlo, Nonparametric Bayesian statistics


Share this article to: More

Go to Contents

References:

<Show All>

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





DOI:

10.1631/jzus.A0820796

CLC number:

O212.8; H03

Download Full Text:

Click Here

Downloaded:

5027

Clicked:

6357

Cited:

0

On-line Access:

Received:

2008-11-15

Revision Accepted:

2009-04-10

Crosschecked:

2009-04-29

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952276; Fax: +86-571-87952331; E-mail: jzus@zju.edu.cn
Copyright © 2000~ Journal of Zhejiang University-SCIENCE