Full Text:   <2592>

CLC number: TP301.6

On-line Access: 2017-07-31

Received: 2016-12-17

Revision Accepted: 2017-04-17

Crosschecked: 2017-07-14

Cited: 0

Clicked: 6893

Citations:  Bibtex RefMan EndNote GB/T7714


Dong-sheng LI


-   Go to

Article info.
Open peer comments

Frontiers of Information Technology & Electronic Engineering  2017 Vol.18 No.7 P.915-927


Efficient parallel implementation of a density peaks clustering algorithm on graphics processing unit

Author(s):  Ke-shi Ge, Hua-you Su, Dong-sheng Li, Xi-cheng Lu

Affiliation(s):  National Lab for Parallel and Distributed Processing, College of Computer, National University of Defense Technology, Changsha 410003, China

Corresponding email(s):   gekeshi@nudt.edu.cn, huayousu@163.com, dsli@nudt.edu.cn

Key Words:  Density peak, Graphics processing unit, Parallel computing, Clustering

Ke-shi Ge, Hua-you Su, Dong-sheng Li, Xi-cheng Lu. Efficient parallel implementation of a density peaks clustering algorithm on graphics processing unit[J]. Frontiers of Information Technology & Electronic Engineering, 2017, 18(7): 915-927.

@article{title="Efficient parallel implementation of a density peaks clustering algorithm on graphics processing unit",
author="Ke-shi Ge, Hua-you Su, Dong-sheng Li, Xi-cheng Lu",
journal="Frontiers of Information Technology & Electronic Engineering",
publisher="Zhejiang University Press & Springer",

%0 Journal Article
%T Efficient parallel implementation of a density peaks clustering algorithm on graphics processing unit
%A Ke-shi Ge
%A Hua-you Su
%A Dong-sheng Li
%A Xi-cheng Lu
%J Frontiers of Information Technology & Electronic Engineering
%V 18
%N 7
%P 915-927
%@ 2095-9184
%D 2017
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.1601786

T1 - Efficient parallel implementation of a density peaks clustering algorithm on graphics processing unit
A1 - Ke-shi Ge
A1 - Hua-you Su
A1 - Dong-sheng Li
A1 - Xi-cheng Lu
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 18
IS - 7
SP - 915
EP - 927
%@ 2095-9184
Y1 - 2017
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.1601786

The density peak (DP) algorithm has been widely used in scientific research due to its novel and effective peak density-based clustering approach. However, the DP algorithm uses each pair of data points several times when determining cluster centers, yielding high computational complexity. In this paper, we focus on accelerating the time-consuming density peaks algorithm with a graphics processing unit (GPU). We analyze the principle of the algorithm to locate its computational bottlenecks, and evaluate its potential for parallelism. In light of our analysis, we propose an efficient parallel DP algorithm targeting on a GPU architecture and implement this parallel method with compute unified device architecture (CUDA), called the ‘CUDA-DP platform’. Specifically, we use shared memory to improve data locality, which reduces the amount of global memory access. To exploit the coalescing accessing mechanism of GPU, we convert the data structure of the CUDA-DP program from array of structures to structure of arrays. In addition, we introduce a binary search-and-sampling method to avoid sorting a large array. The results of the experiment show that CUDA-DP can achieve a 45-fold acceleration when compared to the central processing unit based density peaks implementation.


概要:基于密度峰值的聚类方法DP(density peak)由于其新颖有效的特点而广泛应用于科学研究。然而,当确定集群中心时,DP会对每对数据点操作多次,从而导致较高的计算复杂度。在本文中,我们提出了一种基于GPU(graphics processing unit)的高效并行密度峰值算法。我们分析密度峰值聚类算法的原理来研究其计算瓶颈,并评估其并行的潜力。根据分析,我们提出了CUDA-DP(compute unified device architecture-DP),一种针对GPU架构的高效并行密度峰值聚类算法,并用CUDA实现了这种并行方法。具体来说,我们使用共享内存减少了全局内存访问量。更进一步,为了利用GPU的合并访问机制,我们将CUDA-DP程序的数据结构从AOS(array of structures)重构为SOA(structure of arrays)。另外,我们分别引入二进制搜索方法和采样方法,以避免对距离矩阵进行排序造成的计算开销。实验结果表明,与基于CPU的密度峰值实现相比,CUDA-DP可以实现超过45倍的加速。


Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article


[1]Andrade, G., Ramos, G., Madeira, D., et al., 2013. G-DBSCAN: a GPU accelerated algorithm for density-based clustering. Proc. Comput. Sci., 18:369-378.

[2]Arlia, D., Coppola, M., 2001. Experiments in parallel clustering with DBSCAN. European Conf. on Parallel Processing, p.326-331.

[3]Begum, N., Ulanova, L., Wang, J., et al., 2015. Accelerating dynamic time warping clustering with a novel admissible pruning strategy. Proc. 21st ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, p.49-58.

[4]Cheng, Y., 1995. Mean shift, mode seeking, and clustering. IEEE Trans. Patt. Anal. Mach. Intell., 17(8):790-799.

[5]Dean, K.M., Davis, L.M., Lubbeck, J.L., et al., 2015. High-speed multiparameter photophysical analyses of fluorophore libraries. Anal. Chem., 87(10):5026-5030.

[6]Ester, M., Kriegel, H.P., Sander, J., et al., 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. Proc. Int. Conf. on Knowledge Discovery and Data Mining, p.226-231.

[7]Franti, P., Virmajoki, O., Hautamaki, V., 2006. Fast agglomerative clustering using a k-nearest neighbor graph. IEEE Trans. Patt. Anal. Mach. Intell., 28(11):1875-1881.

[8]Fukunaga, K., Hostetler, L., 1975. The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Trans. Inform. Theory, 21(1):32-40.

[9]Garg, A., Mangla, A., Gupta, N., et al., 2006. PBIRCH: a scalable parallel clustering algorithm for incremental data. 10th Int. Database Engineering and Applications Symp., p.315-316.

[10]He, Y., Zhang, F., Li, Y., et al., 2016. Multiple routes recommendation system on massive taxi trajectories. Tsinghua Sci. Technol., 21(5):510-520.

[11]Kohonen, T., 1990. The self-organizing map. Neurocomputing, 78(9):1464-1480.

[12]Li, J., Li, D., Ye, Y., et al., 2015. Efficient multi-tenant virtual machine allocation in cloud data centers.linebreak Tsinghua Sci. Technol., 20(1):81-89.

[13]Li, M., Huang, J., Wang, J., 2016. Paralleled fast search and find of density peaks clustering algorithm on GPUs with CUDA. 17th IEEE/ACIS Int. Conf. on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, p.313-318.

[14]MacQueen, J., 1967. Some methods for classification and analysis of multivariate observations. Proc. 5th Berkeley Symp. on Mathematical Statistics and Probability, p.281-297.

[15]Meng, X., Bradley, J., Yavuz, B., et al., 2016. MLlib: machine learning in Apache Spark. arXiv:1505.06807v1. http://arxiv.org/abs/1505.06807

[16]NVIDIA, 2016. CUDA C Best Practices Guide v8.0. http://developer.nvidia.com [Accessed on Nov. 22, 2016].

[17]Park, H.S., Jun, C.H., 2009. A simple and fast algorithm for K-medoids clustering. Exp. Syst. Appl., 36(2):3336-3341.

[18]Rasmussen, C.E., 2000. The infinite Gaussian mixture model. Adv. Neur. Inform. Proc. Syst., 12:554-560.

[19]Rodriguez, A., Laio, A., 2014. Clustering by fast search and find of density peaks. Science, 344(6191):1492-1496.

[20]Sarazin, T., Azzag, H., Lebbah, M., 2014. SOM clustering using Spark-MapReduce. IEEE Int. Parallel and linebreak Distributed Processing Symp. Workshops, p.1727-1734.

[21]Shalom, S.A., Dash, M., Tue, M., 2008. Efficient k-means clustering using accelerated graphics processors. Int. Conf. on Data Warehousing and Knowledge Discovery, p.166-175.

[22]Veenman, C.J., Reinders, M.J.T., Backer, E., 2002. A maximum variance cluster algorithm. IEEE Trans. Patt. Anal. Mach. Intell., 24(9):1273-1280.

[23]Xu, R., Wunsch, D., 2005. Survey of clustering algorithms. IEEE Trans. Neur. Netw., 16(3):645-678.

[24]Xu, X., J äger, J., Kriegel, H.P., 2002. A fast parallel clustering algorithm for large spatial databases. it In: Guo, Y., Grossman, R. (Eds.), High Performance Data Mining: Scaling Algorithms, Applications and Systems. Springer US, Boston, p.263-290.

[25]Zamuner, S., Rodriguez, A., Seno, F., et al., 2015. An efficient algorithm to perform local concerted movements of a chain molecule. PLOS ONE, 10(3):1-27.

[26]Zhang, T., Ramakrishnan, R., Livny, M., 1997. BIRCH: a new data clustering algorithm and its applications. Data Min. Knowl. Discov., 1(2):141-182.

[27]Zhang, Y., Chen, S., Yu, G., 2016. Efficient distributed density peaks for clustering large data sets in {MapReduce}. IEEE Trans. Knowl. Data Eng., 28(12):3218-3230.

[28]Zhao, W., Ma, H., He, Q., 2009. Parallel k-means clustering based on MapReduce. IEEE Int. Conf. on Cloud Computing, p.674-679.

Open peer comments: Debate/Discuss/Question/Opinion


Please provide your name, email address and a comment

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - 2024 Journal of Zhejiang University-SCIENCE