Journal of Zhejiang University

Frontiers of Information Technology & Electronic Engineering 2022 Vol.23 No.3 P.409-421

http://doi.org/10.1631/FITEE.2000657

NGAT: attention in breadth and depth exploration for semi-supervised graph representation learning

Author(s): Jianke HU, Yin ZHANG
Affiliation(s): 1. College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
Corresponding email(s): yinzh@zju.edu.cn
Key Words: Graph learning, Semi-supervised learning, Node classification, Attention

Share this article to： More <<< Previous Article \|Next Article >>>

Jianke HU, Yin ZHANG. NGAT: attention in breadth and depth exploration for semi-supervised graph representation learning[J]. Frontiers of Information Technology & Electronic Engineering, 2022, 23(3): 409-421.

@article{title="NGAT: attention in breadth and depth exploration for semi-supervised graph representation learning",
author="Jianke HU, Yin ZHANG",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="23",
number="3",
pages="409-421",
year="2022",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2000657"
}

%0 Journal Article
%T NGAT: attention in breadth and depth exploration for semi-supervised graph representation learning
%A Jianke HU
%A Yin ZHANG
%J Frontiers of Information Technology & Electronic Engineering
%V 23
%N 3
%P 409-421
%@ 2095-9184
%D 2022
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2000657

TY - JOUR
T1 - NGAT: attention in breadth and depth exploration for semi-supervised graph representation learning
A1 - Jianke HU
A1 - Yin ZHANG
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 23
IS - 3
SP - 409
EP - 421
%@ 2095-9184
Y1 - 2022
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2000657

Abstract
Chinese Summary
Academic Network
Reviewer Comment

Abstract: Recently, graph neural networks (GNNs) have achieved remarkable performance in representation learning on graph-structured data. However, as the number of network layers increases, GNNs based on the neighborhood aggregation strategy deteriorate due to the problem of oversmoothing, which is the major bottleneck for applying GNNs to real-world graphs. Many efforts have been made to improve the process of feature information aggregation from directly connected nodes, i.e., breadth exploration. However, these models perform the best only in the case of three or fewer layers, and the performance drops rapidly for deep layers. To alleviate oversmoothing, we propose a nested graph attention network (NGAT), which can work in a semi-supervised manner. In addition to breadth exploration, a k-layer NGAT uses a layer-wise aggregation strategy guided by the attention mechanism to selectively leverage feature information from the kth-order neighborhood, i.e., depth exploration. Even with a 10-layer or deeper architecture, NGAT can balance the need for preserving the locality (including root node features and the local structure) and aggregating the information from a large neighborhood. In a number of experiments on standard node classification tasks, NGAT outperforms other novel models and achieves state-of-the-art performance.

NGAT：基于广度和深度探索注意力机制的半监督图表示学习

胡荐苛，张引
浙江大学计算机科学与技术学院，中国杭州市，310027
摘要：近年来图神经网络（GNN）在图结构数据表示学习方面取得显著成绩。然而，随着网络层数增加，由于过度平滑问题，基于邻域信息聚合策略的GNN性能恶化，这也是GNN应用于真实图的主要瓶颈。研究人员对直连节点的特征信息聚合过程进行了许多改进，即广度探索。然而，这些模型仅在层数为3或更少的情况下才表现最佳，而在深层情况下性能迅速下降。为缓解过度平滑，本文提出一种嵌套的图注意网络，即基于双重注意力机制的多尺度特征融合模型NGAT，该网络可以半监督形式工作。除广度探索，k层NGAT运用注意力机制引导的分层聚合策略，选择性地利用来自k阶邻域的信息特征，即深度探索。即使对于10层或更深的架构，NGAT也能平衡保留局部性（包括根节点特征和局部结构）和从大型邻域聚合信息的需求。本文在公开数据集上对比了现有图神经网络模型，实验表明本文提出的NGAT模型具备更强的节点嵌入学习能力。

关键词：图学习；半监督学习；节点分类；注意力机制

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Atwood J, Towsley D, 2016. Diffusion-convolutional neural networks. Proc 30^th Int Conf on Neural Information Processing Systems, p.2001-2009.

[2]Belkin M, Niyogi P, Sindhwani V, 2006. Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res, 7:2399-2434.

[3]Bruna J, Zaremba W, Szlam A, et al., 2014. Spectral networks and locally connected networks on graphs. https://arxiv.org/abs/1312.6203

[4]Buchnik E, Cohen E, 2018. Bootstrapped graph diffusions: exposing the power of nonlinearity. Proc ACM Int Conf on Measurement and Modeling of Computer Systems, p.8-10.

[5]Chapelle O, Scholkopf B, Zien A, 2009. Semi-supervised learning (Chapelle, O. et al., Eds.; 2006) [book reviews]. IEEE Trans Neur Netw, 20(3):542.

[6]Chen J, Ma TF, Xiao C, 2018. FastGCN: fast learning with graph convolutional networks via importance sampling. https://arxiv.org/abs/1801.10247

[7]Defferrard M, Bresson X, Vandergheynst P, 2016. Convolutional neural networks on graphs with fast localized spectral filtering. Proc 30^th Int Conf on Neural Information Processing Systems, p.3844-3852.

[8]Grover A, Leskovec J, 2016. node2vec: scalable feature learning for networks. Proc 22^nd ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining, p.855-864.

[9]Hamilton WL, Ying R, Leskovec J, 2017. Inductive representation learning on large graphs. Proc 31^st Int Conf on Neural Information Processing Systems, p.1025-1035.

[10]He KM, Zhang XY, Ren SQ, et al., 2016. Deep residual learning for image recognition. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.770-778.

[11]Kingma DP, Ba J, 2014. Adam: a method for stochastic optimization. https://arxiv.org/abs/1412.6980

[12]Kipf TN, Welling M, 2017. Semi-supervised classification with graph convolutional networks. https://arxiv.org/abs/1609.02907

[13]Klicpera J, Bojchevski A, Günnemann S, 2019. Predict then propagate: graph neural networks meet personalized pagerank. https://arxiv.org/abs/1810.05997v4

[14]Knyazev B, Taylor GW, Amer MR, 2019. Understanding attention and generalization in graph neural networks. Proc 33^rd Conf on Neural Information Processing Systems, p.4204-4214.

[15]Krizhevsky A, Sutskever I, Hinton GE, 2012. ImageNet classification with deep convolutional neural networks. Proc 25^th Int Conf on Neural Information Processing Systems, p.1097-1105.

[16]Lee J, Lee I, Kang J, 2019. Self-attention graph pooling. https://arxiv.org/abs/1904.08082

[17]Li QM, Han ZC, Wu XM, 2018. Deeper insights into graph convolutional networks for semi-supervised learning. Proc 32^nd AAAI Conf on Artificial Intelligence, p.3538-3545.

[18]Liao RJ, Zhao ZZ, Urtasun R, et al., 2019. LanczosNet: multi-scale deep graph convolutional networks. https://arxiv.org/abs/1901.01484v1

[19]Namata G, London B, Getoor L, et al., 2012. Query-driven active surveying for collective classification. Proc 10^th Int Workshop on Mining and Learning with Graphs, Article 8.

[20]Niepert M, Ahmed M, Kutzkov K, 2016. Learning convolutional neural networks for graphs. Proc 33^rd Int Conf on Machine Learning, p.2014-2023.

[21]Perozzi B, Al-Rfou R, Skiena S, 2014. DeepWalk: online learning of social representations. Proc 20^th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining, p.701-710.

[22]Ribeiro LFR, Saverese PHP, Figueiredo DR, 2017. struc2vec: learning node representations from structural identity. Proc 23^rd ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining, p.385-394.

[23]Sen P, Namata G, Bilgic M, et al., 2008. Collective classification in network data. AI Mag, 29(3):93.

[24]Shchur O, Mumme M, Bojchevski A, et al., 2018. Pitfalls of graph neural network evaluation. https://arxiv.org/abs/1811.05868

[25]Simonyan K, Zisserman A, 2014. Very deep convolutional networks for large-scale image recognition. https://arxiv.org/abs/1409.1556

[26]Srivastava N, Hinton G, Krizhevsky A, et al., 2014. Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res, 15(1):1929-1958.

[27]Thekumparampil KK, Wang C, Oh S, et al., 2018. Attention-based graph neural network for semi-supervised learning. https://arxiv.org/abs/1803.03735

[28]van der Maaten L, Hinton G, 2008. Visualizing data using t-SNE. J Mach Learn Res, 9:2579-2605.

[29]Vaswani A, Shazeer N, Parmar N, et al., 2017. Attention is all you need. Proc 31^st Int Conf on Neural Information Processing Systems, p.6000-6010.

[30]Veličković P, Cucurull G, Casanova A, et al., 2018. Graph attention networks. https://arxiv.org/abs/1710.10903v1

[31]Veličković P, Fedus W, Hamilton WL, et al., 2019. Deep graph infomax. https://arxiv.org/abs/1809.10341

[32]Wu F, Zhang TY, de Souza AH Jr, et al., 2019. Simplifying graph convolutional networks. https://arxiv.org/abs/1902.07153

[33]Wu ZH, Pan SR, Chen FW, et al., 2019. A comprehensive survey on graph neural networks. https://arxiv.org/abs/1901.00596

[34]Xu K, Li CT, Tian YL, et al., 2018. Representation learning on graphs with jumping knowledge networks. https://arxiv.org/abs/1806.03536

[35]Xu K, Hu WH, Leskovec J, et al., 2019. How powerful are graph neural networks? https://arxiv.org/abs/1810.00826

[36]Yang ZL, Cohen W, Salakhudinov R, 2016. Revisiting semi-supervised learning with graph embeddings. Proc 33^rd Int Conf on Machine Learning, p.40-48.

[37]Zhou J, Cui GQ, Zhang ZY, et al., 2018. Graph neural networks: a review of methods and applications. https://arxiv.org/abs/1812.08434

[38]Zhu XJ, Ghahramani Z, Lafferty J, 2003. Semi-supervised learning using Gaussian fields and harmonic functions. Proc 20^th Int Conf on Machine Learning, p.912-919.

[39]Zou DF, Hu ZN, Wang YW, et al., 2019. Layer-dependent importance sampling for training deep and large graph convolutional networks. Proc 33^rd Int Conf on Neural Information Processing Systems, p.11247-11256.

Open peer comments: Debate/Discuss/Question/Opinion

<1>