JZUS - Journal of Zhejiang University SCIENCE

Frontiers of Information Technology & Electronic Engineering

ISSN 2095-9184 (print), ISSN 2095-9230 (online)

2025 Vol.26 No.7 P.1017-1026

Towards the first principles of explaining DNNs: interactions explain the learning dynamics

Huilin ZHOU, Qihan REN, Junpeng ZHANG, Quanshi ZHANG

School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China; School of Computer Science, Shanghai Jiao Tong University, Shanghai 200240, China

zhouhuilin116@sjtu.edu.cn, renqihan@sjtu.edu.cn, zhangjp63@sjtu.edu.cn, zqs1022@sjtu.edu.cn

Abstract: Most explanation methods are designed in an empirical manner, so exploring whether there exists a first-principles explanation of a deep neural network (DNN) becomes the next core scientific problem in explainable artificial intelligence (XAI). Although it is still an open problem, in this paper, we discuss whether the interaction-based explanation can serve as the first-principles explanation of a DNN. The strong explanatory power of interaction theory comes from the following aspects: (1) it establishes a new axiomatic system to quantify the decision-making logic of a DNN into a set of symbolic interaction concepts; (2) it simultaneously explains various deep learning phenomena, such as generalization power, adversarial sensitivity, representation bottleneck, and learning dynamics; (3) it provides mathematical tools that uniformly explain the mechanisms of various empirical attribution methods and empirical adversarial-transferability-boosting methods; (4) it explains the extremely complex learning dynamics of a DNN by analyzing the two-phase dynamics of interaction complexity, which further reveals the internal mechanism of why and how the generalization power/adversarial sensitivity of a DNN changes during the learning process.

Key words: First-principles explanation; Theory of equivalent interactions; Two-phase dynamics of interactions; Learning dynamics

Chinese Summary <10> 面向深度神经网络解释的第一性原理：基于等效交互理论解析学习动态性

周慧琳¹，任启涵¹，张俊鹏¹，张拳石^1,2
¹上海交通大学电子信息与电气工程学院，中国上海市，200240
²上海交通大学计算机学院，中国上海市，200240
摘要：当前关于深度学习可解释性的大部分研究都是经验主义的，而是否存在第一性原理，从不同角度全方位严谨解释深度神经网络的内在机理，成为可解释人工智能领域亟待解决的核心科学问题之一。本文探讨等效交互理论可否用于深度神经网络的第一性原理解释分析。我们认为，该理论之所以具备较强的解释能力，主要体现在以下4个方面：（1）建立了一套新的公理体系，将深度神经网络的决策逻辑转化为一系列符号化的交互；（2）能够同时解释深度学习的多种典型特征，包括网络的泛化能力、抗敏感性、表征瓶颈以及学习动态性；（3）提供了统一解释深度学习算法的数学工具，从而能够系统地解释各种经验归因方法以及对抗迁移性方法背后的机制；（4）分析深度神经网络建模过程中交互复杂度的双阶段动态变化，解释深度神经网络在训练过程中建模的复杂性以及泛化能力和抗敏感性之间的联系，从而深入揭示深度神经网络的泛化能力和抗敏感性在学习阶段的内在变化机理。

关键词组：第一性原理解释；等效交互理论；双阶段动态交互；学习动态性

Share this article to： More

Go to Contents

References:

Open peer comments: Debate/Discuss/Question/Opinion

<1>

DOI:

10.1631/FITEE.2401025

CLC number:

TP183

Download Full Text:

Click Here

Downloaded:

1721

Download summary:

Downloaded:

443

Clicked:

980

Cited:

On-line Access:

2025-07-28

Received:

2024-11-25

Revision Accepted:

2025-01-26

Crosschecked:

2025-07-30

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952276; Fax: +86-571-87952331; E-mail: jzus@zju.edu.cn
Copyright © 2000~ Journal of Zhejiang University-SCIENCE

CONTENTS

INSTR. FOR AUTHOR

FOR REVIEWER

ABOUT JZUS

Publishing Service