JZUS - Journal of Zhejiang University SCIENCE

Frontiers of Information Technology & Electronic Engineering 2025 Vol.26 No.7 P.1017-1026

Towards the first principles of explaining DNNs: interactions explain the learning dynamics

Author(s): Huilin ZHOU, Qihan REN, Junpeng ZHANG, Quanshi ZHANG
Affiliation(s): School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China; more
Corresponding email(s): zhouhuilin116@sjtu.edu.cn, renqihan@sjtu.edu.cn, zhangjp63@sjtu.edu.cn, zqs1022@sjtu.edu.cn
Key Words: First-principles explanation, Theory of equivalent interactions, Two-phase dynamics of interactions, Learning dynamics

Share this article to： More \|Next Article >>>

Huilin ZHOU, Qihan REN, Junpeng ZHANG, Quanshi ZHANG. Towards the first principles of explaining DNNs: interactions explain the learning dynamics[J]. Frontiers of Information Technology & Electronic Engineering, 2025, 26(7): 1017-1026.

@article{title="Towards the first principles of explaining DNNs: interactions explain the learning dynamics",
author="Huilin ZHOU, Qihan REN, Junpeng ZHANG, Quanshi ZHANG",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="26",
number="7",
pages="1017-1026",
year="2025",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2401025"
}

%0 Journal Article
%T Towards the first principles of explaining DNNs: interactions explain the learning dynamics
%A Huilin ZHOU
%A Qihan REN
%A Junpeng ZHANG
%A Quanshi ZHANG
%J Frontiers of Information Technology & Electronic Engineering
%V 26
%N 7
%P 1017-1026
%@ 2095-9184
%D 2025
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2401025

TY - JOUR
T1 - Towards the first principles of explaining DNNs: interactions explain the learning dynamics
A1 - Huilin ZHOU
A1 - Qihan REN
A1 - Junpeng ZHANG
A1 - Quanshi ZHANG
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 26
IS - 7
SP - 1017
EP - 1026
%@ 2095-9184
Y1 - 2025
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2401025

Abstract
Chinese Summary
Academic Network
Reviewer Comment

Abstract: Most explanation methods are designed in an empirical manner, so exploring whether there exists a first-principles explanation of a deep neural network (DNN) becomes the next core scientific problem in explainable artificial intelligence (XAI). Although it is still an open problem, in this paper, we discuss whether the interaction-based explanation can serve as the first-principles explanation of a DNN. The strong explanatory power of interaction theory comes from the following aspects: (1) it establishes a new axiomatic system to quantify the decision-making logic of a DNN into a set of symbolic interaction concepts; (2) it simultaneously explains various deep learning phenomena, such as generalization power, adversarial sensitivity, representation bottleneck, and learning dynamics; (3) it provides mathematical tools that uniformly explain the mechanisms of various empirical attribution methods and empirical adversarial-transferability-boosting methods; (4) it explains the extremely complex learning dynamics of a DNN by analyzing the two-phase dynamics of interaction complexity, which further reveals the internal mechanism of why and how the generalization power/adversarial sensitivity of a DNN changes during the learning process.

面向深度神经网络解释的第一性原理：基于等效交互理论解析学习动态性

周慧琳¹，任启涵¹，张俊鹏¹，张拳石^1,2
¹上海交通大学电子信息与电气工程学院，中国上海市，200240
²上海交通大学计算机学院，中国上海市，200240
摘要：当前关于深度学习可解释性的大部分研究都是经验主义的，而是否存在第一性原理，从不同角度全方位严谨解释深度神经网络的内在机理，成为可解释人工智能领域亟待解决的核心科学问题之一。本文探讨等效交互理论可否用于深度神经网络的第一性原理解释分析。我们认为，该理论之所以具备较强的解释能力，主要体现在以下4个方面：（1）建立了一套新的公理体系，将深度神经网络的决策逻辑转化为一系列符号化的交互；（2）能够同时解释深度学习的多种典型特征，包括网络的泛化能力、抗敏感性、表征瓶颈以及学习动态性；（3）提供了统一解释深度学习算法的数学工具，从而能够系统地解释各种经验归因方法以及对抗迁移性方法背后的机制；（4）分析深度神经网络建模过程中交互复杂度的双阶段动态变化，解释深度神经网络在训练过程中建模的复杂性以及泛化能力和抗敏感性之间的联系，从而深入揭示深度神经网络的泛化能力和抗敏感性在学习阶段的内在变化机理。

关键词：第一性原理解释；等效交互理论；双阶段动态交互；学习动态性

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Adebayo J, Gilmer J, Muelly M, et al., 2018. Sanity checks for saliency maps. Proc 32^nd Int Conf on Neural Information Processing Systems, p.9525-9536.

[2]Bau D, Zhou BL, Khosla A, et al., 2017. Network dissection: quantifying interpretability of deep visual representations. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.6541-6549.

[3]Chen L, Lou SY, Huang BH, et al., 2024. Defining and extracting generalizable interaction primitives from DNNs. Proc 12^th Int Conf on Learning Representations.

[4]Cheng X, Cheng L, Peng ZR, et al., 2024. Layerwise change of knowledge in neural networks. Proc 41^st Int Conf on Machine Learning, Article 316.

[5]Deng HQ, Ren QH, Zhang H, et al., 2021. Discovering and explaining the representation bottleneck of DNNs. Proc 9^th Int Conf on Learning Representations.

[6]Deng HQ, Zou N, Du MN, et al., 2024. Unifying fourteen post-hoc attribution methods with Taylor interactions. IEEE Trans Patt Anal Mach Intell, 46(7):4625-4640.

[7]Devlin J, Chang MW, Lee K, et al., 2019. BERT: pre-training of deep bidirectional transformers for language understanding. Proc Conf of North American Chapter of the Association for Computational Linguistics: Human Language Technologies, p.4171-4186.

[8]Dua D, Graff C, 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/ml

[9]Ghassemi M, Oakden-Rayner L, Beam AL, 2021. The false hope of current approaches to explainable artificial intelligence in health care. Lancet Dig Health, 3(11):e745-e750.

[10]Kang JS, Erginbas YE, Butler L, et al., 2024. Learning to understand: identifying interactions via the Möbius transform. Proc 38^th Int Conf on Neural Information Processing Systems, p.46160-46202.

[11]Kim B, Wattenberg M, Gilmer J, et al., 2018. Interpretability beyond feature attribution: quantitative testing with concept activation vectors (TCAV). Proc 35^th Int Conf on Machine Learning, p.2668-2677.

[12]Krizhevsky A, Hinton G, 2009. Learning Multiple Layers of Features from Tiny Images. Technical Report No. TR-2009, University of Toronto, Toronto, Canada.

[13]Krizhevsky A, Sutskever I, Hinton GE, 2012. ImageNet classification with deep convolutional neural networks. Proc 26^th Int Conf on Neural Information Processing Systems, p.1097-1105.

[14]Le Y, Yang X, 2015. Tiny ImageNet Visual Recognition Challenge. CS 231N, 7(7):3.

[15]LeCun Y, Bottou L, Bengio Y, et al., 1998. Gradient-based learning applied to document recognition. Proc IEEE, 86(11):2278-2324.

[16]Li MJ, Zhang QS, 2023. Does a neural network really encode symbolic concepts? Proc 40^th Int Conf on Machine Learning, Article 843.

[17]Liu DR, Deng HQ, Cheng X, et al., 2023. Towards the difficulty for a deep neural network to learn concepts of different complexities. Proc 37^th Int Conf on Advances in Neural Information Processing Systems, Article 36.

[18]Qi CR, Su H, Mo KC, et al., 2017. PointNet: deep learning on point sets for 3D classification and segmentation. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.652-660.

[19]Ren J, Zhang D, Wang YS, et al., 2021. Towards a unified game-theoretic view of adversarial perturbations and robustness. Proc 35^th Int Conf on Neural Information Processing Systems, p.3797-3810.

[20]Ren J, Li MJ, Chen QR, et al., 2023a. Defining and quantifying the emergence of sparse concepts in DNNs. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.20280-20289.

[21]Ren J, Zhou ZP, Chen QR, et al., 2023b. Can we faithfully represent absence states to compute Shapley values on a DNN? Proc 11^th Int Conf on Learning Representations.

[22]Ren QH, Deng HQ, Chen YN, et al., 2023a. Bayesian neural networks avoid encoding complex and perturbation-sensitive concepts. Proc 40^th Int Conf on Machine Learning, p.28889-28913.

[23]Ren QH, Gao JY, Shen W, et al., 2023b. Where we have arrived in proving the emergence of sparse interaction primitives in DNNs. Proc 12^th Int Conf on Learning Representations.

[24]Ren QH, Zhang JP, Xu Y, et al., 2024. Towards the dynamics of a DNN learning symbolic interactions. https://arxiv.org/abs/2407.19198

[25]Rudin C, 2019. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach Intell, 1(5):206-215.

[26]Selvaraju RR, Cogswell M, Das A, et al., 2017. Grad-CAM: visual explanations from deep networks via gradient-based localization. Proc IEEE Int Conf on Computer Vision, p.618-626.

[27]Shapley LS, 1953. A value for n-person games. In: Kuhn H, Tucker A (Eds.), Contributions to the Theory of Games. Princeton University Press, Princeton, USA, p.307-317.

[28]Shen W, Cheng L, Yang YX, et al., 2023. Can the inference logic of large language models be disentangled into symbolic concepts? https://arxiv.org/abs/2304.01083

[29]Simonyan K, Zisserman A, 2014. Very deep convolutional networks for large-scale image recognition. https://arxiv.org/abs/1409.1556

[30]Simonyan K, Vedaldi A, Zisserman A, 2014. Deep inside convolutional networks: visualising image classification models and saliency maps. https://arxiv.org/abs/1312.6034

[31]Socher R, Perelygin A, Wu J, et al., 2013. Recursive deep models for semantic compositionality over a sentiment treebank. Proc Conf on Empirical Methods in Natural Language Processing, p.1631-1642.

[32]Wah C, Branson S, Welinder P, et al., 2011. The Caltech-UCSD Birds-200-2011 Dataset. Technical Report No. CNS-TR-2011-001, California Institute of Technology, Pasadena, USA.

[33]Wang X, Ren J, Lin SY, et al., 2021. A unified approach to interpreting and boosting adversarial transferability. Proc 9^th Int Conf on Learning Representations.

[34]Wang Y, Sun YB, Liu ZW, et al., 2019. Dynamic graph CNN for learning on point clouds. ACM Trans Graph, 38(5):146.

[35]Yi L, Kim VG, Ceylan D, et al., 2016. A scalable active framework for region annotation in 3D shape collections. ACM Trans Graph, 35(6):210.

[36]Yosinski J, Clune J, Nguyen A, et al., 2015. Understanding neural networks through deep visualization. https://arxiv.org/abs/1506.06579

[37]Zhang H, Li S, Ma YC, et al., 2020. Interpreting and boosting dropout from a game-theoretic view. Proc 8^th Int Conf on Learning Representations.

[38]Zhang JP, Li Q, Lin L, et al., 2024. Two-phase dynamics of interactions explains the starting point of a DNN learning over-fitted features. https://arxiv.org/abs/2405.10262

[39]Zhou HL, Zhang H, Deng HQ, et al., 2024. Explaining generalization power of a DNN using interactive concepts. Proc 38^th AAAI Conf on Artificial Intelligence, Article 19707.

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Similar articles

- Go to

面向深度神经网络解释的第一性原理：基于等效交互理论解析学习动态性

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference