Full Text:   <2954>

CLC number: TP391.41

On-line Access: 

Received: 2003-06-18

Revision Accepted: 2003-10-12

Crosschecked: 0000-00-00

Cited: 27

Clicked: 5678

Citations:  Bibtex RefMan EndNote GB/T7714

-   Go to

Article info.
Open peer comments

Journal of Zhejiang University SCIENCE A 2005 Vol.6 No.1 P.71-78

http://doi.org/10.1631/jzus.2005.A0071


A statistical information-based clustering approach in distance space


Author(s):  Shi-hong Yue1, Ping Li1, Ji-dong Guo2, Shui-geng Zhou1

Affiliation(s):  1. Institute of Industrial Process Control, Zhejiang University, Hangzhou 310027, China; more

Corresponding email(s):   shyue@iipc.zju.edu.cn

Key Words:  DBSCAN algorithm, Statistical information, Threshold


Share this article to: More <<< Previous Article|

YUE Shi-hong, LI Ping, GUO Ji-dong, ZHOU Shui-geng. A statistical information-based clustering approach in distance space[J]. Journal of Zhejiang University Science A, 2005, 6(1): 71-78.

@article{title="A statistical information-based clustering approach in distance space",
author="YUE Shi-hong, LI Ping, GUO Ji-dong, ZHOU Shui-geng",
journal="Journal of Zhejiang University Science A",
volume="6",
number="1",
pages="71-78",
year="2005",
publisher="Zhejiang University Press & Springer",
doi="10.1631/jzus.2005.A0071"
}

%0 Journal Article
%T A statistical information-based clustering approach in distance space
%A YUE Shi-hong
%A LI Ping
%A GUO Ji-dong
%A ZHOU Shui-geng
%J Journal of Zhejiang University SCIENCE A
%V 6
%N 1
%P 71-78
%@ 1673-565X
%D 2005
%I Zhejiang University Press & Springer
%DOI 10.1631/jzus.2005.A0071

TY - JOUR
T1 - A statistical information-based clustering approach in distance space
A1 - YUE Shi-hong
A1 - LI Ping
A1 - GUO Ji-dong
A1 - ZHOU Shui-geng
J0 - Journal of Zhejiang University Science A
VL - 6
IS - 1
SP - 71
EP - 78
%@ 1673-565X
Y1 - 2005
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/jzus.2005.A0071


Abstract: 
Clustering, as a powerful data mining technique for discovering interesting data distributions and patterns in the underlying database, is used in many fields, such as statistical data analysis, pattern recognition, image processing, and other business applications. Density-based Spatial Clustering of Applications with Noise (DBSCAN) (Ester et al., 1996) is a good performance clustering method for dealing with spatial data although it leaves many problems to be solved. For example, DBSCAN requires a necessary user-specified threshold while its computation is extremely time-consuming by current method such as OPTICS, etc. (Ankerst et al., 1999), and the performance of DBSCAN under different norms has yet to be examined. In this paper, we first developed a method based on statistical information of distance space in database to determine the necessary threshold. Then our examination of the DBSCAN performance under different norms showed that there was determinable relation between them. Finally, we used two artificial databases to verify the effectiveness and efficiency of the proposed methods.

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

References

[1] Agrawal, R., Gehrke, J., Gunpopulos, D., 1998. Automatic Subspace Clustering of High DiMensional Data for Data Mining Applications. , Proc. of ACM SIGMOD Int. Conf. on Management of Data, Seattle, WA, 73-84. :73-84. 

[2] Ankerst, M., Breunig, M., Kriegel, H.P., 1999. OPTICS: Ordering Points to Identify the Clustering Structure. , Proc. 1999 ACM SIGMOD Int. Conf. Management of Data Mining, PA, 49-60. :49-60. 

[3] Bechmann, N., Kriegel, H.P., Schneider, R., 1990. The R*-tree: An Efficient and Robust Access Method for Points and Rectangles. , Proc. ACM SIGMOD Int. Conf. On Management of Data. Alt. City, NJ, 322-331. :322-331. 

[4] Ester, M., Kriegel, H.P., Sander, H., 1996. A Density-based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. , Proc. of 2nd Int. Conf. on Knowledge Discovering in Databases and Data Mining. Portland, Oregon, 232-1239. :232-1239. 

[5] Guha, S., Rastogi, R., Shim, K., 1998. CURE: An Efficient Clustering Algorithm for Large Databases. , Proc. of the ACM SIGMOD Int. Conf. on Management of Data. Seattle, WA, 73-84. :73-84. 

[6] Han, J., 2001. Data Mining, Morgan Kaufmann Publishers, USA,:242-266. 

[7] Halkidi, M., Batistakis, Y., Vazirgiannis, M., 2002. Clustering validity checking methods: part II. SIGMOD Record, 31(4):51-62. 

[8] Karypos, G., Han, E.H., Kunar, V., 1993. CHAMELEON: A hierarchical clustering algorithm using dynamic modeling. Computer, 32(8):68-75. 

[9] Nakamura, E., Kehtarnavaz, N., 1998. Determining number of clusters and prototype locations via multi-scale clustering. Pattern Recognition Letters, 19(3):1265-1283. 

[10] Sheikholeslami, G., Chatterjee, S., Zhang, A., 1998. Wavecluster: A Multi-resolution Clustering Approach for very Large Spatial Databases. , Proc. of 24th VLDB Conf., New York, 428-439. :428-439. 

[11] Yue, S.H., Li, P., Guo, J.D., Zhou, S.G., 2004. Using Greedy algorithm: DBSCAN revisited II. J Zhejiang Univ SCI, 5(11):1405-1412. 

[12] Zhang, W., Yang, Y., Munta, R., 1997. STING: An Statistical Information Grid Approach to Spatial Data Mining. , Proc. of 23rd VLDB Conf., Seattle, WA, 186-195. :186-195. 


Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - 2023 Journal of Zhejiang University-SCIENCE