CLC number: TP311

On-line Access: 2024-08-27

Received: 2023-10-17

Revision Accepted: 2024-05-08

Crosschecked: 2015-12-09

Cited: 2

Clicked: 7256

Hui-zong Li


Frontiers of Information Technology & Electronic Engineering  2016 Vol.17 No.2 P.122-134


A social tag clustering method based on common co-occurrence group similarity

Author(s):  Hui-zong Li, Xue-gang Hu, Yao-jin Lin, Wei He, Jian-han Pan

Affiliation(s):  1School of Computer and Information, Hefei University of Technology, Hefei 230009, China; more

Corresponding email(s):   lihz_aust@sina.com, jsjxhuxg@hfut.edu.cn, yjlin@mnnu.edu.cn, peter.jhpan@gmail.com

Key Words:  Social tagging systems, Tag co-occurrence, Spectral clustering, Group similarity

Hui-zong Li, Xue-gang Hu, Yao-jin Lin, Wei He, Jian-han Pan. A social tag clustering method based on common co-occurrence group similarity[J]. Frontiers of Information Technology & Electronic Engineering, 2016, 17(2): 122-134.

social tagging systems are widely applied in Web 2.0. Many users use these systems to create, organize, manage, and share Internet resources freely. However, many ambiguous and uncontrolled tags produced by social tagging systems not only worsen users’ experience, but also restrict resources’ retrieval efficiency. Tag clustering can aggregate tags with similar semantics together, and help mitigate the above problems. In this paper, we first present a common co-occurrence group similarity based approach, which employs the ternary relation among users, resources, and tags to measure the semantic relevance between tags. Then we propose a spectral clustering method to address the high dimensionality and sparsity of the annotating data. Finally, experimental results show that the proposed method is useful and efficient.

The introduction of the paper is well presented. The state of the art section is well done, indicating the recent research in the area and following chronological order. In the presentation of the methodology the authors begin by describing the notation used to represent the model of social tagging system, as well as the status of co-occurrences between tags (co-occur for the same resource tags; for a single user, or for a same user-feature combination). The authors used examples to explain this part. In analyzing the results, the authors used two more geared metrics for clustering (Silhouette coefficient and Dunn index), according to the authors, rather than precision and recall. The results were compared with other four approaches adopted in state of the art. The algorithm was implemented in Matlab, and based on the metric previously proposed. The results obtained are satisfactory.


方法:利用共同共现群体相似度来计算两两标签的相似度,建立相似度矩阵(公式(4))。使用谱聚类算法实验标签的聚类,首先使用拉普拉斯(Laplacian)变换对相似度矩阵进行规范化,建立标签的规范化拉普拉斯(Normalized Laplacian)矩阵,然后计算该矩阵的前k个特征值及其对应的特征向量,并将这k个特征向量组成新的特征空间,在此空间上用K-means算法将标签聚成k个类簇(算法1)。


