|
Frontiers of Information Technology & Electronic Engineering
ISSN 2095-9184 (print), ISSN 2095-9230 (online)
2025 Vol.26 No.7 P.1017-1026
Towards the first principles of explaining DNNs: interactions explain the learning dynamics
Abstract: Most explanation methods are designed in an empirical manner, so exploring whether there exists a first-principles explanation of a deep neural network (DNN) becomes the next core scientific problem in explainable artificial intelligence (XAI). Although it is still an open problem, in this paper, we discuss whether the interaction-based explanation can serve as the first-principles explanation of a DNN. The strong explanatory power of interaction theory comes from the following aspects: (1) it establishes a new axiomatic system to quantify the decision-making logic of a DNN into a set of symbolic interaction concepts; (2) it simultaneously explains various deep learning phenomena, such as generalization power, adversarial sensitivity, representation bottleneck, and learning dynamics; (3) it provides mathematical tools that uniformly explain the mechanisms of various empirical attribution methods and empirical adversarial-transferability-boosting methods; (4) it explains the extremely complex learning dynamics of a DNN by analyzing the two-phase dynamics of interaction complexity, which further reveals the internal mechanism of why and how the generalization power/adversarial sensitivity of a DNN changes during the learning process.
Key words: First-principles explanation; Theory of equivalent interactions; Two-phase dynamics of interactions; Learning dynamics
1上海交通大学电子信息与电气工程学院,中国上海市,200240
2上海交通大学计算机学院,中国上海市,200240
摘要:当前关于深度学习可解释性的大部分研究都是经验主义的,而是否存在第一性原理,从不同角度全方位严谨解释深度神经网络的内在机理,成为可解释人工智能领域亟待解决的核心科学问题之一。本文探讨等效交互理论可否用于深度神经网络的第一性原理解释分析。我们认为,该理论之所以具备较强的解释能力,主要体现在以下4个方面:(1)建立了一套新的公理体系,将深度神经网络的决策逻辑转化为一系列符号化的交互;(2)能够同时解释深度学习的多种典型特征,包括网络的泛化能力、抗敏感性、表征瓶颈以及学习动态性;(3)提供了统一解释深度学习算法的数学工具,从而能够系统地解释各种经验归因方法以及对抗迁移性方法背后的机制;(4)分析深度神经网络建模过程中交互复杂度的双阶段动态变化,解释深度神经网络在训练过程中建模的复杂性以及泛化能力和抗敏感性之间的联系,从而深入揭示深度神经网络的泛化能力和抗敏感性在学习阶段的内在变化机理。
关键词组:
References:
Open peer comments: Debate/Discuss/Question/Opinion
<1>
DOI:
10.1631/FITEE.2401025
CLC number:
TP183
Download Full Text:
Downloaded:
324
Clicked:
327
Cited:
0
On-line Access:
2025-07-28
Received:
2024-11-25
Revision Accepted:
2025-01-26
Crosschecked:
2025-07-30