CLC number: TP37; TP391
On-line Access: 2024-08-27
Received: 2023-10-17
Revision Accepted: 2024-05-08
Crosschecked: 0000-00-00
Cited: 20
Clicked: 8827
Ding-yin XIA, Fei WU, Xu-qing ZHANG, Yue-ting ZHUANG. Local and global approaches of affinity propagation clustering for large scale data[J]. Journal of Zhejiang University Science A, 2008, 9(10): 1373-1381.
@article{title="Local and global approaches of affinity propagation clustering for large scale data",
author="Ding-yin XIA, Fei WU, Xu-qing ZHANG, Yue-ting ZHUANG",
journal="Journal of Zhejiang University Science A",
volume="9",
number="10",
pages="1373-1381",
year="2008",
publisher="Zhejiang University Press & Springer",
doi="10.1631/jzus.A0720058"
}
%0 Journal Article
%T Local and global approaches of affinity propagation clustering for large scale data
%A Ding-yin XIA
%A Fei WU
%A Xu-qing ZHANG
%A Yue-ting ZHUANG
%J Journal of Zhejiang University SCIENCE A
%V 9
%N 10
%P 1373-1381
%@ 1673-565X
%D 2008
%I Zhejiang University Press & Springer
%DOI 10.1631/jzus.A0720058
TY - JOUR
T1 - Local and global approaches of affinity propagation clustering for large scale data
A1 - Ding-yin XIA
A1 - Fei WU
A1 - Xu-qing ZHANG
A1 - Yue-ting ZHUANG
J0 - Journal of Zhejiang University Science A
VL - 9
IS - 10
SP - 1373
EP - 1381
%@ 1673-565X
Y1 - 2008
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/jzus.A0720058
Abstract: Recently a new clustering algorithm called ‘affinity propagation’ (AP) has been proposed, which efficiently clustered sparsely related data by passing messages between data points. However, we want to cluster large scale data where the similarities are not sparse in many cases. This paper presents two variants of AP for grouping large scale data with a dense similarity matrix. The local approach is partition affinity propagation (PAP) and the global method is landmark affinity propagation (LAP). PAP passes messages in the subsets of data first and then merges them as the number of initial step of iterations; it can effectively reduce the number of iterations of clustering. LAP passes messages between the landmark data points first and then clusters non-landmark data points; it is a large global approximation method to speed up clustering. Experiments are conducted on many datasets, such as random data points, manifold subspaces, images of faces and Chinese calligraphy, and the results demonstrate that the two approaches are feasible and practicable.
[1] Bell, R.M., Koren, Y., Volinsky, C., 2007. Modeling Relationships at Multiple Scales to Improve Accuracy of Large Recommender Systems. Proc. 13th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, San Jose, California, USA, p.95-104.
[2] Ben-Hur, A., Horn, D., Siegelmann, H.T., Vapnik, V., 2001. Support vector clustering. J. Machine Learning Res., 2(2):125-137.
[3] de Silva, V., Tenenbaum, J.B., 2003. Global versus Local Methods in Nonlinear Dimensionality Reduction. Neural Information Processing Systems, p.705-712.
[4] de Silva, V., Tenenbaum, J.B., 2004. Sparse Multidimensional Scaling Using Landmark Points. Technical Report. Stanford University.
[5] Donath, W.E., Hoffman, A.J., 1973. Lower bounds for partitioning of graphs. IBM J. Res. Dev., 17(5):420-425.
[6] Enright, A.J., van Dongen, S., Ouzounis, C.A., 2002. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res., 30(7):1575-1584.
[7] Fiedler, M., 1973. Algebraic connectivity of graphs. Czech. Math. J., 23:298-305.
[8] Frey, B.J., Dueck, D., 2006. Mixture Modeling by Affinity Propagation. Neural Information Processing Systems, p.379-386.
[9] Frey, B.J., Dueck, D., 2007. Clustering by passing messages between data points. Science, 315(5814):972-976.
[10] Guha, S., Rastogi, R., Shim, K., 2001. CURE: an efficient clustering algorithm for large databases. Inf. Syst., 26(1):35-58.
[11] Kanade, T., Cohn, J.F., Tian, Y.L., 2000. Comprehensive Database for Facial Expression Analysis. Proc. 4th IEEE Int. Conf. on Automatic Face and Gesture Recognition, p.46-53.
[12] Kschischang, F.R., Frey, B.J., Loeliger, H.A., 2001. Factor graphs and the sum-product algorithm. IEEE Trans. on Inf. Theory, 47(2):498-519.
[13] MacQueen, J., 1967. Some Methods for Classification and Analysis of Multivariate Observations. Proc. 5th Berkeley Symp. on Mathematical Statistics and Probability. University of California Press, Berkeley, CA, 1:281-297.
[14] Pothen, A., Simon, H.D., Liou, K.P., 1990. Partitioning sparse matrices with eigenvectors of graph. SIAM J. Matrix Anal. Appl., 11(3):430-452.
[15] Silva, J.G., Marques, J.S., Lemos, J.M., 2005. Selecting Landmark Points for Sparse Manifold Learning. Advances in Neural Information Processing Systems. MIT Press.
[16] Wittman, T., 2005. MANIfold Learning Matlab Demo. Http://www.math.umn.edu/~wittman/research.html
[17] Zhuang, Y.T., Zhang, X.F., Wu, J.Q., Lu, X.Q., 2004. Retrieval of Chinese Calligraphic Character Image. Proc. Pacific Rim Conf. on Multimedia, p.17-24.
Open peer comments: Debate/Discuss/Question/Opinion
<1>