|
Frontiers of Information Technology & Electronic Engineering
ISSN 2095-9184 (print), ISSN 2095-9230 (online)
2015 Vol.16 No.6 P.457-465
Topic modeling for large-scale text data
Abstract: This paper develops a novel online algorithm, namely moving average stochastic variational inference (MASVI), which applies the results obtained by previous iterations to smooth out noisy natural gradients. We analyze the convergence property of the proposed algorithm and conduct a set of experiments on two large-scale collections that contain millions of documents. Experimental results indicate that in contrast to algorithms named ‘stochastic variational inference’ and ‘SGRLD’, our algorithm achieves a faster convergence rate and better performance.
Key words: Latent Dirichlet allocation (LDA), Topic modeling, Online learning, Moving average
创新点:使用多次迭代的随机梯度移动平均值近似代替真实随机梯度,以此减小随机梯度和真实梯度间的误差。
方法:以主题模型的基础模型潜在狄利克雷分配为载体展开研究。考虑不同次迭代的文本子集具有不同的词汇(表1),使用不同次迭代的随机项移动平均值近似代替真实随机梯度的随机项。为尽可能保证算法的精度,使用最近R次迭代的随机项(图2)并验证所提算法的收敛性。
结论:在随机变分推理算法基础上,提出一种移动平均随机变分推理算法,实现更好的文本主题建模效果和更快的收敛速度。
关键词组:
References:
Open peer comments: Debate/Discuss/Question/Opinion
<1>
DOI:
10.1631/FITEE.1400352
CLC number:
TP391.1
Download Full Text:
Downloaded:
2905
Download summary:
<Click Here>Downloaded:
1987Clicked:
7646
Cited:
3
On-line Access:
2024-08-27
Received:
2023-10-17
Revision Accepted:
2024-05-08
Crosschecked:
2015-05-07