CLC number: TP183
On-line Access: 2025-07-28
Received: 2024-11-25
Revision Accepted: 2025-01-26
Crosschecked: 2025-07-30
Cited: 0
Clicked: 466
Citations: Bibtex RefMan EndNote GB/T7714
Huilin ZHOU, Qihan REN, Junpeng ZHANG, Quanshi ZHANG. Towards the first principles of explaining DNNs: interactions explain the learning dynamics[J]. Frontiers of Information Technology & Electronic Engineering, 2025, 26(7): 1017-1026.
@article{title="Towards the first principles of explaining DNNs: interactions explain the learning dynamics",
author="Huilin ZHOU, Qihan REN, Junpeng ZHANG, Quanshi ZHANG",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="26",
number="7",
pages="1017-1026",
year="2025",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2401025"
}
%0 Journal Article
%T Towards the first principles of explaining DNNs: interactions explain the learning dynamics
%A Huilin ZHOU
%A Qihan REN
%A Junpeng ZHANG
%A Quanshi ZHANG
%J Frontiers of Information Technology & Electronic Engineering
%V 26
%N 7
%P 1017-1026
%@ 2095-9184
%D 2025
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2401025
TY - JOUR
T1 - Towards the first principles of explaining DNNs: interactions explain the learning dynamics
A1 - Huilin ZHOU
A1 - Qihan REN
A1 - Junpeng ZHANG
A1 - Quanshi ZHANG
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 26
IS - 7
SP - 1017
EP - 1026
%@ 2095-9184
Y1 - 2025
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2401025
Abstract: Most explanation methods are designed in an empirical manner, so exploring whether there exists a first-principles explanation of a deep neural network (DNN) becomes the next core scientific problem in explainable artificial intelligence (XAI). Although it is still an open problem, in this paper, we discuss whether the interaction-based explanation can serve as the first-principles explanation of a DNN. The strong explanatory power of interaction theory comes from the following aspects: (1) it establishes a new axiomatic system to quantify the decision-making logic of a DNN into a set of symbolic interaction concepts; (2) it simultaneously explains various deep learning phenomena, such as generalization power, adversarial sensitivity, representation bottleneck, and learning dynamics; (3) it provides mathematical tools that uniformly explain the mechanisms of various empirical attribution methods and empirical adversarial-transferability-boosting methods; (4) it explains the extremely complex learning dynamics of a DNN by analyzing the two-phase dynamics of interaction complexity, which further reveals the internal mechanism of why and how the generalization power/adversarial sensitivity of a DNN changes during the learning process.
[1]Adebayo J, Gilmer J, Muelly M, et al., 2018. Sanity checks for saliency maps. Proc 32nd Int Conf on Neural Information Processing Systems, p.9525-9536.
[2]Bau D, Zhou BL, Khosla A, et al., 2017. Network dissection: quantifying interpretability of deep visual representations. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.6541-6549.
[3]Chen L, Lou SY, Huang BH, et al., 2024. Defining and extracting generalizable interaction primitives from DNNs. Proc 12th Int Conf on Learning Representations.
[4]Cheng X, Cheng L, Peng ZR, et al., 2024. Layerwise change of knowledge in neural networks. Proc 41st Int Conf on Machine Learning, Article 316.
[5]Deng HQ, Ren QH, Zhang H, et al., 2021. Discovering and explaining the representation bottleneck of DNNs. Proc 9th Int Conf on Learning Representations.
[6]Deng HQ, Zou N, Du MN, et al., 2024. Unifying fourteen post-hoc attribution methods with Taylor interactions. IEEE Trans Patt Anal Mach Intell, 46(7):4625-4640.
[7]Devlin J, Chang MW, Lee K, et al., 2019. BERT: pre-training of deep bidirectional transformers for language understanding. Proc Conf of North American Chapter of the Association for Computational Linguistics: Human Language Technologies, p.4171-4186.
[8]Dua D, Graff C, 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/ml
[9]Ghassemi M, Oakden-Rayner L, Beam AL, 2021. The false hope of current approaches to explainable artificial intelligence in health care. Lancet Dig Health, 3(11):e745-e750.
[10]Kang JS, Erginbas YE, Butler L, et al., 2024. Learning to understand: identifying interactions via the Möbius transform. Proc 38th Int Conf on Neural Information Processing Systems, p.46160-46202.
[11]Kim B, Wattenberg M, Gilmer J, et al., 2018. Interpretability beyond feature attribution: quantitative testing with concept activation vectors (TCAV). Proc 35th Int Conf on Machine Learning, p.2668-2677.
[12]Krizhevsky A, Hinton G, 2009. Learning Multiple Layers of Features from Tiny Images. Technical Report No. TR-2009, University of Toronto, Toronto, Canada.
[13]Krizhevsky A, Sutskever I, Hinton GE, 2012. ImageNet classification with deep convolutional neural networks. Proc 26th Int Conf on Neural Information Processing Systems, p.1097-1105.
[14]Le Y, Yang X, 2015. Tiny ImageNet Visual Recognition Challenge. CS 231N, 7(7):3.
[15]LeCun Y, Bottou L, Bengio Y, et al., 1998. Gradient-based learning applied to document recognition. Proc IEEE, 86(11):2278-2324.
[16]Li MJ, Zhang QS, 2023. Does a neural network really encode symbolic concepts? Proc 40th Int Conf on Machine Learning, Article 843.
[17]Liu DR, Deng HQ, Cheng X, et al., 2023. Towards the difficulty for a deep neural network to learn concepts of different complexities. Proc 37th Int Conf on Advances in Neural Information Processing Systems, Article 36.
[18]Qi CR, Su H, Mo KC, et al., 2017. PointNet: deep learning on point sets for 3D classification and segmentation. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.652-660.
[19]Ren J, Zhang D, Wang YS, et al., 2021. Towards a unified game-theoretic view of adversarial perturbations and robustness. Proc 35th Int Conf on Neural Information Processing Systems, p.3797-3810.
[20]Ren J, Li MJ, Chen QR, et al., 2023a. Defining and quantifying the emergence of sparse concepts in DNNs. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.20280-20289.
[21]Ren J, Zhou ZP, Chen QR, et al., 2023b. Can we faithfully represent absence states to compute Shapley values on a DNN? Proc 11th Int Conf on Learning Representations.
[22]Ren QH, Deng HQ, Chen YN, et al., 2023a. Bayesian neural networks avoid encoding complex and perturbation-sensitive concepts. Proc 40th Int Conf on Machine Learning, p.28889-28913.
[23]Ren QH, Gao JY, Shen W, et al., 2023b. Where we have arrived in proving the emergence of sparse interaction primitives in DNNs. Proc 12th Int Conf on Learning Representations.
[24]Ren QH, Zhang JP, Xu Y, et al., 2024. Towards the dynamics of a DNN learning symbolic interactions. https://arxiv.org/abs/2407.19198
[25]Rudin C, 2019. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach Intell, 1(5):206-215.
[26]Selvaraju RR, Cogswell M, Das A, et al., 2017. Grad-CAM: visual explanations from deep networks via gradient-based localization. Proc IEEE Int Conf on Computer Vision, p.618-626.
[27]Shapley LS, 1953. A value for n-person games. In: Kuhn H, Tucker A (Eds.), Contributions to the Theory of Games. Princeton University Press, Princeton, USA, p.307-317.
[28]Shen W, Cheng L, Yang YX, et al., 2023. Can the inference logic of large language models be disentangled into symbolic concepts? https://arxiv.org/abs/2304.01083
[29]Simonyan K, Zisserman A, 2014. Very deep convolutional networks for large-scale image recognition. https://arxiv.org/abs/1409.1556
[30]Simonyan K, Vedaldi A, Zisserman A, 2014. Deep inside convolutional networks: visualising image classification models and saliency maps. https://arxiv.org/abs/1312.6034
[31]Socher R, Perelygin A, Wu J, et al., 2013. Recursive deep models for semantic compositionality over a sentiment treebank. Proc Conf on Empirical Methods in Natural Language Processing, p.1631-1642.
[32]Wah C, Branson S, Welinder P, et al., 2011. The Caltech-UCSD Birds-200-2011 Dataset. Technical Report No. CNS-TR-2011-001, California Institute of Technology, Pasadena, USA.
[33]Wang X, Ren J, Lin SY, et al., 2021. A unified approach to interpreting and boosting adversarial transferability. Proc 9th Int Conf on Learning Representations.
[34]Wang Y, Sun YB, Liu ZW, et al., 2019. Dynamic graph CNN for learning on point clouds. ACM Trans Graph, 38(5):146.
[35]Yi L, Kim VG, Ceylan D, et al., 2016. A scalable active framework for region annotation in 3D shape collections. ACM Trans Graph, 35(6):210.
[36]Yosinski J, Clune J, Nguyen A, et al., 2015. Understanding neural networks through deep visualization. https://arxiv.org/abs/1506.06579
[37]Zhang H, Li S, Ma YC, et al., 2020. Interpreting and boosting dropout from a game-theoretic view. Proc 8th Int Conf on Learning Representations.
[38]Zhang JP, Li Q, Lin L, et al., 2024. Two-phase dynamics of interactions explains the starting point of a DNN learning over-fitted features. https://arxiv.org/abs/2405.10262
[39]Zhou HL, Zhang H, Deng HQ, et al., 2024. Explaining generalization power of a DNN using interactive concepts. Proc 38th AAAI Conf on Artificial Intelligence, Article 19707.
Open peer comments: Debate/Discuss/Question/Opinion
<1>