Publishing Service

Polishing & Checking

Frontiers of Information Technology & Electronic Engineering

ISSN 2095-9184 (print), ISSN 2095-9230 (online)

An anchor-based spectral clustering method

Abstract: Spectral clustering is one of the most popular and important clustering methods in pattern recognition, machine learning, and data mining. However, its high computational complexity limits it in applications involving truly large-scale datasets. For a clustering problem with n samples, it needs to compute the eigenvectors of the graph Laplacian with O(n3) time complexity. To address this problem, we propose a novel method called anchor-based spectral clustering (ASC) by employing anchor points of data. Specifically, m (mn) anchor points are selected from the dataset, which can basically maintain the intrinsic (manifold) structure of the original data. Then a mapping matrix between the original data and the anchors is constructed. More importantly, it is proved that this data-anchor mapping matrix essentially preserves the clustering structure of the data. Based on this mapping matrix, it is easy to approximate the spectral embedding of the original data. The proposed method scales linearly relative to the size of the data but with low degradation of the clustering performance. The proposed method, ASC, is compared to the classical spectral clustering and two state-of-the-art accelerating methods, i.e., power iteration clustering and landmark-based spectral clustering, on 10 real-world applications under three evaluation metrics. Experimental results show that ASC is consistently faster than the classical spectral clustering with comparable clustering performance, and at least comparable with or better than the state-of-the-art methods on both effectiveness and efficiency.

Key words: Clustering, Spectral clustering, Graph Laplacian, Anchors

Chinese Summary  <18> 一种基于锚点的谱聚类方法

摘要:谱聚类是模式识别、机器学习和数据挖掘中最流行最重要的聚类方法之一。然而,高计算复杂度妨碍了谱聚类在大规模数据集的应用。对于具有n个样本的聚类问题,谱聚类需O(n3)时间复杂度计算图拉普拉斯矩阵特征向量。为解决该问题,提出一种新的基于锚点谱聚类方法(anchor-based spectral clustering,ASC)。首先,在数据集中选择mmn)个可以基本保持原始数据内在(流形)结构的锚点。然后,构造原始数据与锚点之间的映射矩阵,并证明该映射矩阵能保持数据的聚类结构。基于该映射矩阵,近似得到原始数据谱嵌入。ASC方法复杂度与数据集大小呈线性关系。将该方法与经典谱聚类方法和两种最新谱聚类加速方法,即能量迭代聚类(power iteration clustering,PIC)和基于地标聚类(landmark-based spectral clustering,LSC),在10个真实数据集上比较。实验结果表明,ASC算法比经典谱聚类算法具有更快聚类速度,在效率和有效性上与现有方法相当或优于现有方法。

关键词组:聚类;谱聚类;图拉普拉斯;锚点


Share this article to: More

Go to Contents

References:

<Show All>

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





DOI:

10.1631/FITEE.1700262

CLC number:

TP311

Download Full Text:

Click Here

Downloaded:

2712

Download summary:

<Click Here> 

Downloaded:

1566

Clicked:

6124

Cited:

0

On-line Access:

2018-12-14

Received:

2017-04-17

Revision Accepted:

2017-08-15

Crosschecked:

2018-11-27

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952276; Fax: +86-571-87952331; E-mail: jzus@zju.edu.cn
Copyright © 2000~ Journal of Zhejiang University-SCIENCE