CLC number: TP311.5
On-line Access: 2024-08-27
Received: 2023-10-17
Revision Accepted: 2024-05-08
Crosschecked: 2021-12-02
Cited: 0
Clicked: 3373
Citations: Bibtex RefMan EndNote GB/T7714
Wan ZHOU, Yong WANG, Cuiyun GAO, Fei YANG. Emerging topic identification from app reviews via adaptive online biterm topic modeling[J]. Frontiers of Information Technology & Electronic Engineering, 2022, 23(5): 678-691.
@article{title="Emerging topic identification from app reviews via adaptive online biterm topic modeling",
author="Wan ZHOU, Yong WANG, Cuiyun GAO, Fei YANG",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="23",
number="5",
pages="678-691",
year="2022",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2100465"
}
%0 Journal Article
%T Emerging topic identification from app reviews via adaptive online biterm topic modeling
%A Wan ZHOU
%A Yong WANG
%A Cuiyun GAO
%A Fei YANG
%J Frontiers of Information Technology & Electronic Engineering
%V 23
%N 5
%P 678-691
%@ 2095-9184
%D 2022
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2100465
TY - JOUR
T1 - Emerging topic identification from app reviews via adaptive online biterm topic modeling
A1 - Wan ZHOU
A1 - Yong WANG
A1 - Cuiyun GAO
A1 - Fei YANG
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 23
IS - 5
SP - 678
EP - 691
%@ 2095-9184
Y1 - 2022
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2100465
Abstract: Emerging topics in app reviews highlight the topics (e.g., software bugs) with which users are concerned during certain periods. Identifying emerging topics accurately, and in a timely manner, could help developers more effectively update apps. Methods for identifying emerging topics in app reviews based on topic models or clustering methods have been proposed in the literature. However, the accuracy of emerging topic identification is reduced because reviews are short in length and offer limited information. To solve this problem, an improved emerging topic identification (IETI) approach is proposed in this work. Specifically, we adopt natural language processing techniques to reduce noisy data, and identify emerging topics in app reviews using the adaptive online biterm topic model. Then we interpret the implicature of emerging topics through relevant phrases and sentences. We adopt the official app changelogs as ground truth, and evaluate IETI in six common apps. The experimental results indicate that IETI is more accurate than the baseline in identifying emerging topics, with improvements in the F1 score of 0.126 for phrase labels and 0.061 for sentence labels. Finally, we release the codes of IETI on Github (https://github.com/wanizhou/IETI).
[1]AlSumait L, Barbará D, Domeniconi C, 2008. On-line LDA: adaptive topic models for mining text streams with applications to topic detection and tracking. Proc 8th IEEE Int Conf on Data Mining, p.3-12.
[2]Aslam N, Ramay WY, Xia KW, et al., 2020. Convolutional neural network based classification of app reviews. IEEE Access, 8:185619-185628.
[3]Blei DM, Ng AY, Jordan MI, 2003. Latent Dirichlet allocation. J Mach Learn Res, 3:993-1022.
[4]Calefato F, Lanubile F, Maiorano F, et al., 2018. Sentiment polarity detection for software development. Empir Softw Eng, 23(3):1352-1382.
[5]Chen N, Lin JL, Hoi SCH, et al., 2014. AR-miner: mining informative reviews for developers from mobile app marketplace. Proc 36th Int Conf on Software Engineering, p.767-778.
[6]Cheng XQ, Yan XH, Lan YY, et al., 2014. BTM: topic modeling over short texts. IEEE Trans Knowl Data Eng, 26(12):2928-2941.
[7]Choi HJ, Park CH, 2019. Emerging topic detection in Twitter stream based on high utility pattern mining. Expert Syst Appl, 115:27-36.
[8]Darbanibasmanj AA, Persaud A, Ruhi U, 2019. Application of machine learning to mining customer reviews. Proc 25th Americas Conf on Information Systems, Article 21.
[9]Devlin J, Chang MW, Lee K, et al., 2019. BERT: pre-training of deep bidirectional transformers for language understanding. Conf of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, p.4171-4186.
[10]Fan YM, Ma JX, 2014. Detection of emerging topics based on LDA and feature analysis of emerging topics. J China Soc Sci Techn Inform, 33(7):698-711 (in Chinese).
[11]Gao CY, Xu H, Hu JJ, et al., 2015. AR-Tracker: track the dynamics of mobile apps via user review mining. IEEE Symp on Service-Oriented System Engineering, p.284-290.
[12]Gao CY, Zeng JC, Lyu MR, et al., 2018. Online app review analysis for identifying emerging issues. Proc 40th Int Conf on Software Engineering, p.48-58.
[13]Gao CY, Zheng W, Deng Y, et al., 2019. Emerging app issue identification from user feedback: experience on WeChat. Proc 41st Int Conf on Software Engineering: Software Engineering in Practice, p.279-288.
[14]Genc-Nayebi N, Abran A, 2017. A systematic literature review: opinion mining studies from mobile app store user reviews. J Syst Softw, 125:207-219.
[15]Gu XD, Kim S, 2015. “What parts of your apps are loved by users?1”. Proc 30th IEEE/ACM Int Conf on Automated Software Engineering, p.760-770.
[16]Guzman E, El-Haliby M, Bruegge B, 2015. Ensemble methods for app review classification: an approach for software evolution. Proc 30th IEEE/ACM Int Conf on Automated Software Engineering, p.771-776.
[17]Hadi MA, Fard FH, 2020. AOBTM: adaptive online biterm topic modeling for version sensitive short-texts analysis. IEEE Int Conf on Software Maintenance and Evolution, p.593-604.
[18]Huang JJ, Peng M, Wang H, et al., 2017. A probabilistic method for emerging topic tracking in microblog stream. World Wide Web, 20(2):325-350.
[19]Jha N, Mahmoud A, 2019. Mining non-functional requirements from app store reviews. Empir Softw Eng, 24(6):3659-3695.
[20]Jin MM, Luo X, Zhu HL, et al., 2018. Combining deep learning and topic modeling for review understanding in context-aware recommendation. Conf of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, p.1605-1614.
[21]Li CL, Duan Y, Wang HR, et al., 2017. Enhancing topic modeling for short texts with auxiliary word embeddings. ACM Trans Inform Syst, 36(2):11.
[22]Li YC, Jia BX, Guo Y, et al., 2017. Mining user reviews for mobile app comparisons. Proc ACM Interact Mob Wear Ubiquit Technol, 1(3):75.
[23]Liu YD, Li YW, Guo YH, et al., 2016. Stratify mobile app reviews: E-LDA model based on hot “entity” discovery. Proc 12th Int Conf on Signal-Image Technology & Internet-Based Systems, p.581-588.
[24]Liu YZ, Liu L, Liu HX, et al., 2019. App store mining for iterative domain analysis: combine app descriptions with user reviews. Softw Pract Exp, 49(6):1013-1040.
[25]Maalej W, Nabil H, 2015. Bug report, feature request, or simply praise? On automatically classifying app reviews. Proc 23rd IEEE Int Requirements Engineering Conf, p.116-125.
[26]McIlroy S, Ali N, Hassan AE, 2016. Fresh apps: an empirical study of frequently-updated mobile apps in the Google Play Store. Empir Softw Eng, 21(3):1346-1370.
[27]Mei QZ, Shen XH, Zhai CX, 2007. Automatic labeling of multinomial topic models. Proc 13th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining, p.490-499.
[28]Nguyen TS, Lauw HW, Tsaparas P, 2015. Review synthesis for micro-review summarization. Proc 8th ACM Int Conf on Web Search and Data Mining, p.169-178.
[29]Noei E, Zhang F, Zou Y, 2021. Too many user-reviews! What should app developers look at first? IEEE Trans Softw Eng, 47(2):367-378.
[30]Park DH, Liu MW, Zhai CX, et al., 2015. Leveraging user reviews to improve accuracy for mobile app retrieval. Proc 38th Int ACM SIGIR Conf on Research and Development in Information Retrieval, p.533-542.
[31]Rousseeuw PJ, Hubert M, 2011. Robust statistics for outlier detection. WIREs Data Min Knowl Discov, 1(1):73-79.
[32]Sarro F, Al-Subaihin AA, Harman M, et al., 2015. Feature lifecycles as they spread, migrate, remain, and die in App Stores. Proc 23rd IEEE Int Requirements Engineering Conf, p.76-85.
[33]Su YQ, Wang YC, Yang WH, 2019. Mining and comparing user reviews across similar mobile apps. Proc 15th Int Conf on Mobile Ad-Hoc and Sensor Networks, p.338-342.
[34]Verasakulvong E, Vateekul P, Piyatumrong A, et al., 2018. Online emerging topic detection on Twitter using random forest with stock indicator features. Proc 15th Int Joint Conf on Computer Science and Software Engineering, p.1-6.
[35]Vu PM, Pham HV, Nguyen TT, et al., 2016. Phrase-based extraction of user opinions in mobile app reviews. Proc 31st IEEE/ACM Int Conf on Automated Software Engineering, p.726-731.
[36]Wang Z, Gu SM, Xu XW, 2018. GSLDA: LDA-based group spamming detection in product reviews. Appl Intell, 48(9):3094-3107.
[37]Zeng JC, Li J, Song Y, et al., 2018. Topic memory networks for short text classification. Conf on Empirical Methods in Natural Language Processing, p.3120-3131.
Open peer comments: Debate/Discuss/Question/Opinion
<1>