CLC number: TP301.6
On-line Access: 2024-08-27
Received: 2023-10-17
Revision Accepted: 2024-05-08
Crosschecked: 2017-07-14
Cited: 0
Clicked: 8284
Ke-shi Ge, Hua-you Su, Dong-sheng Li, Xi-cheng Lu. Efficient parallel implementation of a density peaks clustering algorithm on graphics processing unit[J]. Frontiers of Information Technology & Electronic Engineering, 2017, 18(7): 915-927.
@article{title="Efficient parallel implementation of a density peaks clustering algorithm on graphics processing unit",
author="Ke-shi Ge, Hua-you Su, Dong-sheng Li, Xi-cheng Lu",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="18",
number="7",
pages="915-927",
year="2017",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.1601786"
}
%0 Journal Article
%T Efficient parallel implementation of a density peaks clustering algorithm on graphics processing unit
%A Ke-shi Ge
%A Hua-you Su
%A Dong-sheng Li
%A Xi-cheng Lu
%J Frontiers of Information Technology & Electronic Engineering
%V 18
%N 7
%P 915-927
%@ 2095-9184
%D 2017
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.1601786
TY - JOUR
T1 - Efficient parallel implementation of a density peaks clustering algorithm on graphics processing unit
A1 - Ke-shi Ge
A1 - Hua-you Su
A1 - Dong-sheng Li
A1 - Xi-cheng Lu
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 18
IS - 7
SP - 915
EP - 927
%@ 2095-9184
Y1 - 2017
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.1601786
Abstract: The density peak (DP) algorithm has been widely used in scientific research due to its novel and effective peak density-based clustering approach. However, the DP algorithm uses each pair of data points several times when determining cluster centers, yielding high computational complexity. In this paper, we focus on accelerating the time-consuming density peaks algorithm with a graphics processing unit (GPU). We analyze the principle of the algorithm to locate its computational bottlenecks, and evaluate its potential for parallelism. In light of our analysis, we propose an efficient parallel DP algorithm targeting on a GPU architecture and implement this parallel method with compute unified device architecture (CUDA), called the ‘CUDA-DP platform’. Specifically, we use shared memory to improve data locality, which reduces the amount of global memory access. To exploit the coalescing accessing mechanism of GPU, we convert the data structure of the CUDA-DP program from array of structures to structure of arrays. In addition, we introduce a binary search-and-sampling method to avoid sorting a large array. The results of the experiment show that CUDA-DP can achieve a 45-fold acceleration when compared to the central processing unit based density peaks implementation.
[1]Andrade, G., Ramos, G., Madeira, D., et al., 2013. G-DBSCAN: a GPU accelerated algorithm for density-based clustering. Proc. Comput. Sci., 18:369-378.
[2]Arlia, D., Coppola, M., 2001. Experiments in parallel clustering with DBSCAN. European Conf. on Parallel Processing, p.326-331.
[3]Begum, N., Ulanova, L., Wang, J., et al., 2015. Accelerating dynamic time warping clustering with a novel admissible pruning strategy. Proc. 21st ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, p.49-58.
[4]Cheng, Y., 1995. Mean shift, mode seeking, and clustering. IEEE Trans. Patt. Anal. Mach. Intell., 17(8):790-799.
[5]Dean, K.M., Davis, L.M., Lubbeck, J.L., et al., 2015. High-speed multiparameter photophysical analyses of fluorophore libraries. Anal. Chem., 87(10):5026-5030.
[6]Ester, M., Kriegel, H.P., Sander, J., et al., 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. Proc. Int. Conf. on Knowledge Discovery and Data Mining, p.226-231.
[7]Franti, P., Virmajoki, O., Hautamaki, V., 2006. Fast agglomerative clustering using a k-nearest neighbor graph. IEEE Trans. Patt. Anal. Mach. Intell., 28(11):1875-1881.
[8]Fukunaga, K., Hostetler, L., 1975. The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Trans. Inform. Theory, 21(1):32-40.
[9]Garg, A., Mangla, A., Gupta, N., et al., 2006. PBIRCH: a scalable parallel clustering algorithm for incremental data. 10th Int. Database Engineering and Applications Symp., p.315-316.
[10]He, Y., Zhang, F., Li, Y., et al., 2016. Multiple routes recommendation system on massive taxi trajectories. Tsinghua Sci. Technol., 21(5):510-520.
[11]Kohonen, T., 1990. The self-organizing map. Neurocomputing, 78(9):1464-1480.
[12]Li, J., Li, D., Ye, Y., et al., 2015. Efficient multi-tenant virtual machine allocation in cloud data centers.linebreak Tsinghua Sci. Technol., 20(1):81-89.
[13]Li, M., Huang, J., Wang, J., 2016. Paralleled fast search and find of density peaks clustering algorithm on GPUs with CUDA. 17th IEEE/ACIS Int. Conf. on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, p.313-318.
[14]MacQueen, J., 1967. Some methods for classification and analysis of multivariate observations. Proc. 5th Berkeley Symp. on Mathematical Statistics and Probability, p.281-297.
[15]Meng, X., Bradley, J., Yavuz, B., et al., 2016. MLlib: machine learning in Apache Spark. arXiv:1505.06807v1. http://arxiv.org/abs/1505.06807
[16]NVIDIA, 2016. CUDA C Best Practices Guide v8.0. http://developer.nvidia.com [Accessed on Nov. 22, 2016].
[17]Park, H.S., Jun, C.H., 2009. A simple and fast algorithm for K-medoids clustering. Exp. Syst. Appl., 36(2):3336-3341.
[18]Rasmussen, C.E., 2000. The infinite Gaussian mixture model. Adv. Neur. Inform. Proc. Syst., 12:554-560.
[19]Rodriguez, A., Laio, A., 2014. Clustering by fast search and find of density peaks. Science, 344(6191):1492-1496.
[20]Sarazin, T., Azzag, H., Lebbah, M., 2014. SOM clustering using Spark-MapReduce. IEEE Int. Parallel and linebreak Distributed Processing Symp. Workshops, p.1727-1734.
[21]Shalom, S.A., Dash, M., Tue, M., 2008. Efficient k-means clustering using accelerated graphics processors. Int. Conf. on Data Warehousing and Knowledge Discovery, p.166-175.
[22]Veenman, C.J., Reinders, M.J.T., Backer, E., 2002. A maximum variance cluster algorithm. IEEE Trans. Patt. Anal. Mach. Intell., 24(9):1273-1280.
[23]Xu, R., Wunsch, D., 2005. Survey of clustering algorithms. IEEE Trans. Neur. Netw., 16(3):645-678.
[24]Xu, X., J äger, J., Kriegel, H.P., 2002. A fast parallel clustering algorithm for large spatial databases. it In: Guo, Y., Grossman, R. (Eds.), High Performance Data Mining: Scaling Algorithms, Applications and Systems. Springer US, Boston, p.263-290.
[25]Zamuner, S., Rodriguez, A., Seno, F., et al., 2015. An efficient algorithm to perform local concerted movements of a chain molecule. PLOS ONE, 10(3):1-27.
[26]Zhang, T., Ramakrishnan, R., Livny, M., 1997. BIRCH: a new data clustering algorithm and its applications. Data Min. Knowl. Discov., 1(2):141-182.
[27]Zhang, Y., Chen, S., Yu, G., 2016. Efficient distributed density peaks for clustering large data sets in {MapReduce}. IEEE Trans. Knowl. Data Eng., 28(12):3218-3230.
[28]Zhao, W., Ma, H., He, Q., 2009. Parallel k-means clustering based on MapReduce. IEEE Int. Conf. on Cloud Computing, p.674-679.
Open peer comments: Debate/Discuss/Question/Opinion
<1>