Full Text:   <302>

CLC number: 

On-line Access: 2023-07-17

Received: 2022-12-16

Revision Accepted: 2023-07-04

Crosschecked: 0000-00-00

Cited: 0

Clicked: 470

Citations:  Bibtex RefMan EndNote GB/T7714

-   Go to

Article info.
Open peer comments

Journal of Zhejiang University SCIENCE C 1998 Vol.-1 No.-1 P.


Multi-Exit Self-Distillation with Appropriate Teachers

Author(s):  Wujie SUN, Defang CHEN, Can WANG, Deshi YE, Yan FENG, Chun CHEN

Affiliation(s):  College of Computer Science and Technology, Zhejiang University, Hangzhou 310000, China

Corresponding email(s):   sunwujie@zju.edu.cn, wcan@zju.edu.cn

Key Words:  Multi-exit architecture, Knowledge distillation, Learning gap

Wujie SUN, Defang CHEN, Can WANG, Deshi YE, Yan FENG, Chun CHEN. Multi-Exit Self-Distillation with Appropriate Teachers[J]. Frontiers of Information Technology & Electronic Engineering, 1998, -1(-1): .

@article{title="Multi-Exit Self-Distillation with Appropriate Teachers",
author="Wujie SUN, Defang CHEN, Can WANG, Deshi YE, Yan FENG, Chun CHEN",
journal="Frontiers of Information Technology & Electronic Engineering",
publisher="Zhejiang University Press & Springer",

%0 Journal Article
%T Multi-Exit Self-Distillation with Appropriate Teachers
%A Wujie SUN
%A Defang CHEN
%A Deshi YE
%A Chun CHEN
%J Journal of Zhejiang University SCIENCE C
%V -1
%N -1
%@ 2095-9184
%D 1998
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2200644

T1 - Multi-Exit Self-Distillation with Appropriate Teachers
A1 - Wujie SUN
A1 - Defang CHEN
A1 - Can WANG
A1 - Deshi YE
A1 - Yan FENG
A1 - Chun CHEN
J0 - Journal of Zhejiang University Science C
VL - -1
IS - -1
SP -
EP -
%@ 2095-9184
Y1 - 1998
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2200644

multi-exit architecture allows early-stop inference to reduce computational cost, which can be used in resource-constrained circumstances. Recent works combine multi-exit architecture with self-distillation to simultaneously achieve high efficiency and decent performance at different network depths. However, existing methods mainly transfer knowledge from deep exits or a single ensemble to guide all exits, without considering that inappropriate learning gaps between students and teachers may degrade the model performance, especially in shallow exits. To address this issue, we propose Multi-exit self-distillation with Appropriate TEachers (MATE) to provide diverse and appropriate teacher knowledge for each exit. In MATE, multiple ensemble teachers are obtained from all exits with different trainable weights. Each exit subsequently receives knowledge from all teachers, while focusing mainly on its primary teacher to keep an appropriate gap for efficient knowledge transfer. In this way, MATE achieves diversity in knowledge distillation while ensuring learning efficiency. Experimental results on CIFAR-100, TinyImageNet, and three fine-grained datasets demonstrate that MATE consistently outperforms state-of-the-art multi-exit self-distillation methods with various network architectures.

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Open peer comments: Debate/Discuss/Question/Opinion


Please provide your name, email address and a comment

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - 2024 Journal of Zhejiang University-SCIENCE