CLC number: TP391
On-line Access: 2024-08-27
Received: 2023-10-17
Revision Accepted: 2024-05-08
Crosschecked: 2010-04-09
Cited: 8
Clicked: 7859
Wei Chen, Chun Chen, Li-jun Zhang, Can Wang, Jia-jun Bu. Online detection of bursty events and their evolution in news streams[J]. Journal of Zhejiang University Science C, 2010, 11(5): 340-355.
@article{title="Online detection of bursty events and their evolution in news streams",
author="Wei Chen, Chun Chen, Li-jun Zhang, Can Wang, Jia-jun Bu",
journal="Journal of Zhejiang University Science C",
volume="11",
number="5",
pages="340-355",
year="2010",
publisher="Zhejiang University Press & Springer",
doi="10.1631/jzus.C0910245"
}
%0 Journal Article
%T Online detection of bursty events and their evolution in news streams
%A Wei Chen
%A Chun Chen
%A Li-jun Zhang
%A Can Wang
%A Jia-jun Bu
%J Journal of Zhejiang University SCIENCE C
%V 11
%N 5
%P 340-355
%@ 1869-1951
%D 2010
%I Zhejiang University Press & Springer
%DOI 10.1631/jzus.C0910245
TY - JOUR
T1 - Online detection of bursty events and their evolution in news streams
A1 - Wei Chen
A1 - Chun Chen
A1 - Li-jun Zhang
A1 - Can Wang
A1 - Jia-jun Bu
J0 - Journal of Zhejiang University Science C
VL - 11
IS - 5
SP - 340
EP - 355
%@ 1869-1951
Y1 - 2010
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/jzus.C0910245
Abstract: Online monitoring of temporally-sequenced news streams for interesting patterns and trends has gained popularity in the last decade. In this paper, we study a particular news stream monitoring task: timely detection of bursty events which have happened recently and discovery of their evolutionary patterns along the timeline. Here, a news stream is represented as feature streams of tens of thousands of features (i.e., keyword. Each news story consists of a set of keywords.). A bursty event therefore is composed of a group of bursty features, which show bursty rises in frequency as the related event emerges. In this paper, we give a formal definition to the above problem and present a solution with the following steps: (1) applying an online multi-resolution burst detection method to identify bursty features with different bursty durations within a recent time period; (2) clustering bursty features to form bursty events and associating each event with a power value which reflects its bursty level; (3) applying an information retrieval method based on cosine similarity to discover the event’;s evolution (i.e., highly related bursty events in history) along the timeline. We extensively evaluate the proposed methods on the Reuters Corpus Volume 1. Experimental results show that our methods can detect bursty events in a timely way and effectively discover their evolution. The power values used in our model not only measure event’;s bursty level or relative importance well at a certain time point but also show relative strengths of events along the same evolution.
[1]Allan, J., Papka, R., Lavrenko, V., 1998. Online New Event Detection and Tracking. Proc. SIGIR Conf. on Research and Development in Information Retrieval, p.37-45.
[2]Baeza-Yates, R., Ribeiro-Neto, B., 2004. Modern Information Retrieval. China Machine Press, Beijing, China (in Chinese).
[3]Bulut, A., Singh, A.K., 2005. A Unified Framework for Monitoring Data Streams in Real Time. Proc. 21st Int. Conf. on Data Engineering, p.44-55.
[4]Chen, W., Zhang, L.J., Wang, C., Chen, C., Bu, J.J., 2008. Pervasive Web News Recommendation for Visually-Impaired People. IEEE/WIC/ACM Int. Conf. on Web Intelligence and Intelligent Agent Technology, 3:119-122.
[5]Chu, K.K.W., Wong, M.H., 1999. Fast Time-Series Searching with Scaling and Shifting. Proc. 8th ACM SIGMOD Symp. on Principles of Database Systems, p.237-248.
[6]Croft, W.B., Metzler, D., Strohman, T., 2009. Search Engines: Information Retrieval in Practice. Addison Wesley, Boston.
[7]Dezso, Z., Almass, E., Lukacs, A., Racz, B., Szakadat, I., Barabasi, A.L., 2006. Dynamic of information access on the Web. Phys. Rev. E, 73(6):066132.
[8]Frey, B.J., Dueck, D., 2007. Clustering by passing messages between data points. Science, 315(5814):972-976.
[9]Fung, G.P.C., Yu, J.X., Yu, P.S., Lu, H.J., 2005. Parameter Free Bursty Events Detection in Text Streams. Proc. 31st Int. Conf. on Very Large Data Bases, p.181-192.
[10]He, Q., Chang, K., Lim, E., 2007. Analyzing Feature Trajectories for Event Detection. Proc. 30th Annual Int. ACM SIGIR Conf., p.207-214.
[11]Kahveci, T., Singh, A., 2001. Variable Length Queries for Time Series Data. Proc. 17th Int. Conf. on Data Engineering, p.273-282.
[12]Kleinberg, J., 2002. Bursty and Hierarchical Structure in Streams. Proc. 8th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, p.91-101.
[13]Kumaran, G., Allan, J., 2004. Text Classification and Named Entities for New Event Detection. Proc. 27th Annual Int. ACM SIGIR Conf., p.297-304.
[14]Lam, W., Meng, H., Wong, K., Yen, J., 2001. Using contextual analysis for news event detection. Int. J. Intell. Syst., 16(4):525-546.
[15]Lewis, D.D., Yang, Y.M., Rose, T.G., Li, F., 2004. RCV1: a new benchmark collection for text categorization research. J. Mach. Learn. Res., 5:361-397.
[16]Li, Z.W., Wang, B., Li, M.J., Ma, W.Y., 2005. A Probabilistic Model for Retrospective News Event Detection. Proc. SIGIR Conf. on Research and Development in Information Retrieval, p.106-113.
[17]Luxburg, U., 2007. A tutorial on spectral clustering. Statist. & Comput., 17(4):395-416.
[18]Mei, Q.Z., Zhai, C.X., 2005. Discovering Evolutionary Theme Patterns from Text: An Exploration of Temporal Text Mining. Proc. 11th ACM SIGKDD Int. Conf. on Knowledge Discovery in Data Mining, p.198-207.
[19]Topic Detection and Tracking Evaluation (TDT) Project, 2007. Available from http://www.itl.nist.gov/iad/mig//tests/tdt/ [Accessed on Aug. 8, 2009].
[20]Vlachos, M., Meek, C., Vagena, Z., Gunopulos, D., 2004. Identifying Similarities, Periodicities and Bursts for Search Queries. Proc. ACM SIGMOD Int. Conf. on Management of Data, p.131-142.
[21]Xia, D.Y., Wu, F., Zhang, X.Q., Zhuang, Y.T., 2008. Local and global approaches of affinity propagation clustering for large scale data. J. Zhejiang Univ.-Sci. A, 9(10):1373-1381.
[22]Yang, Y.M., Pierce, T., Carbonell, J.G., 1998. A Study on Retrospective and On-line Event Detection. Proc. SIGIR Conf. on Research and Development in Information Retrieval, p.28-36.
[23]Yang, Y.M., Zhang, J., Carbonell, J., Jin, C., 2001. Topic-Conditioned Novelty Detection. Proc. 8th ACM SIGKDD Int. Conf., p.688-693.
[24]Yuan, Z.J., Yan, J., Yang, S.Q., 2007. Online Burst Detection Over High Speed Short Text Streams. Proc. 7th Int. Conf. on Computational Science, p.717-725.
[25]Zhang, K., Li, J.Z., Wu, G., 2007. New Event Detection Based on Indexing-Tree and Name Entity. Proc. 30th Annual Int. ACM SIGIR Conf., p.215-222.
[26]Zhang, K., Li, J.Z., Wu, G., Wang, K.H., 2008. A new event detection model based on term reweighting. J. Softw., 19(4):817-828 (in Chinese).
[27]Zhu, Y., Shasha, D., 2002. Statstream: Statistical Monitoring of Thousands of Data Streams in Real Time. Proc. 28th Int. Conf. on Very Large Databases, p.358-369.
Open peer comments: Debate/Discuss/Question/Opinion
<1>