CLC number: TP391.1
On-line Access: 2024-08-27
Received: 2023-10-17
Revision Accepted: 2024-05-08
Crosschecked: 2015-05-07
Cited: 3
Clicked: 8186
Xi-ming Li, Ji-hong Ouyang, You Lu. Topic modeling for large-scale text data[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.1400352 @article{title="Topic modeling for large-scale text data", %0 Journal Article TY - JOUR
Abstract: Overall, I liked the idea introduced by the paper, as well as the large empirical case study. Scaling up topic models without loss of precision indeed is an important area.
大规模文本数据的主题建模创新点:使用多次迭代的随机梯度移动平均值近似代替真实随机梯度,以此减小随机梯度和真实梯度间的误差。 方法:以主题模型的基础模型潜在狄利克雷分配为载体展开研究。考虑不同次迭代的文本子集具有不同的词汇(表1),使用不同次迭代的随机项移动平均值近似代替真实随机梯度的随机项。为尽可能保证算法的精度,使用最近R次迭代的随机项(图2)并验证所提算法的收敛性。 结论:在随机变分推理算法基础上,提出一种移动平均随机变分推理算法,实现更好的文本主题建模效果和更快的收敛速度。 关键词组: Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article
Reference[1]Amari, S., 1998. Natural gradient works efficiently in learning. Neur. Comput., 10(2):251-276. ![]() [2]Andrieu, C., de Freitas, N., Doucet, A., et al., 2003. An introduction to MCMC for machine learning. Mach. Learn., 50(1-2):5-43. ![]() [3]Blatt, D., Hero, A.O., Gauchman, H., 2007. A convergent incremental gradient method with a constant step size. SIAM J. Optim., 18(1):29-51. ![]() [4]Blei, D.M., 2012. Probabilistic topic models. Commun. ACM, 55(4):77-84. ![]() [5]Blei, D.M., Ng, A.Y., Jordan, M.I., 2003. Latent Dirichlet allocation. J. Mach. Learn. Res., 3:993-1022. ![]() [6]Canini, K.R., Shi, L., Griffiths, T.L., 2009. Online inference of topics with latent Dirichlet allocation. J. Mach. Learn. Res., 5(2):65-72. ![]() [7]Griffiths, T.L., Steyvers, M., 2004. Finding scientific topics. PNAS, 101(suppl 1):5228-5235. ![]() [8]Hoffman, M., Bach, F.R., Blei, D.M., 2010. Online learning for latent Dirichlet allocation. Advances in Neural Information Processing Systems, p.856-864. ![]() [9]Hoffman, M., Blei, D.M., Wang, C., et al., 2013. Stochastic variational inference. J. Mach. Learn. Res., 14(1): 1303-1347. ![]() [10]Liu, Z., Zhang, Y., Chang, E.Y., et al., 2011. PLDA+: parallel latent Dirichlet allocation with data placement and pipeline processing. ACM Trans. Intell. Syst. Technol., 2(3), Article 26. ![]() [11]Newman, D., Asuncion, A., Smyth, P., et al., 2009. Distributed algorithms for topic models. J. Mach. Learn. Res., 10:1801-1828. ![]() [12]Ouyang, J., Lu, Y., Li, X., 2014. Momentum online LDA for large-scale datasets. Proc. 21st European Conf. on Artificial Intelligence, p.1075-1076. ![]() [13]Patterson, S., Teh, Y.W., 2013. Stochastic gradient Riemannian Langevin dynamics on the probability simplex. Advances in Neural Information Processing Systems, p.3102-3110. ![]() [14]Ranganath, R., Wang, C., Blei, D.M., et al., 2013. An adaptive learning rate for stochastic variational inferencen. J. Mach. Learn. Res., 28(2):298-306. ![]() [15]Schaul, T., Zhang, S., LeCun, Y., 2013. No more pesky learning rates. arXiv preprint, arXiv:1206:1106v2. ![]() [16]Song, X., Lin, C.Y., Tseng, B.L., et al., 2005. Modeling and predicting personal information dissemination behavior. Proc. 11th ACM SIGKDD Int. Conf. on Knowledge Discovery in Data Mining, p.479-488. ![]() [17]Tadić, V.B., 2009. Convergence rate of stochastic gradient search in the case of multiple and non-isolated minima. arXiv preprint, arXiv:0904.4229v2. ![]() [18]Teh, Y.W., Newman, D., Welling, M., 2007. A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation. Advances in Neural Information Processing Systems, p.1353-1360. ![]() [19]Wang, C., Chen, X., Smola, A.J., et al., 2013. Variance reduction for stochastic gradient optimization. Advances in Neural Information Processing Systems, p.181-189. ![]() [20]Wang, Y., Bai, H., Stanton, M., et al., 2009. PLDA: parallel latent Dirichlet allocation for large-scale applications. Proc. 5th Int. Conf. on Algorithmic Aspects in Information and Management, p.301-314. ![]() [21]Yan, F., Xu, N., Qi, Y., 2009. Parallel inference for latent Dirichlet allocation on graphics processing units. Advances in Neural Information Processing Systems, p.2134-2142. ![]() [22]Ye, Y., Gong, S., Liu, C., et al., 2013. Online belief propagation algorithm for probabilistic latent semantic analysis. Front. Comput. Sci., 7(5):526-535. ![]() Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou
310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn Copyright © 2000 - 2025 Journal of Zhejiang University-SCIENCE |
Open peer comments: Debate/Discuss/Question/Opinion
<1>