CLC number: TP391
On-line Access: 2025-07-02
Received: 2024-05-12
Revision Accepted: 2025-07-02
Crosschecked: 2024-09-18
Cited: 0
Clicked: 713
Deng LI, Peng LI, Aming WU, Yahong HAN. Prototype-guided cross-task knowledge distillation[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.2400383 @article{title="Prototype-guided cross-task knowledge distillation", %0 Journal Article TY - JOUR
原型引导的跨任务知识蒸馏1天津大学智能与计算学部,中国天津市,300350 2嵩山实验室,中国郑州市,450000 3西安电子科技大学电子工程学院,中国西安市,710401 摘要:近年来,大规模预训练模型在各种任务中展现了其优势。然而,受繁重的计算和巨大的存储需求限制,大规模预训练模型难以部署于真实场景中。现有主流的知识蒸馏方法要求教师模型和学生模型共享相同的标签空间,这限制了预训练模型在真实场景的应用。为缓解不同标签空间的限制,本文提出一种原型引导的跨任务知识蒸馏(ProC-KD)方法,旨在将教师网络的本质物体表征知识迁移到各种下游任务场景中。首先,为更好地学习跨任务场景中的泛化知识,提出一个原型学习模块,从教师网络中学习物体的不变本质表示。其次,对于多样的下游任务,提出一个任务自适应特征增强模块,通过习得的泛化原型表示增强学生网络特征,并指导学生网络的学习以提高其泛化能力。在不同视觉任务上的实验验证了所提方法在跨任务知识蒸馏场景中的有效性。 关键词组: Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article
Reference[1]Ahn S, Hu SX, Damianou A, et al., 2019. Variational information distillation for knowledge transfer. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.9163-9171. ![]() [2]Ba LJ, Caruana R, 2014. Do deep nets really need to be deep? Proc 27th Int Conf on Neural Information Processing Systems, p.2654-2662. ![]() [3]Cao KD, Wei CL, Gaidon A, et al., 2019. Learning imbalanced datasets with label-distribution-aware margin loss. Proc 33rd Int Conf on Neural Information Processing Systems, Article 140. ![]() [4]Carion N, Massa F, Synnaeve G, et al., 2020. End-to-end object detection with transformers. Proc 16th European Conf on Computer Vision, p.213-229. ![]() [5]Chebotar Y, Waters A, 2016. Distilling knowledge from ensembles of neural networks for speech recognition. Proc 17th Annual Conf of the Int Speech Communication Association, p.3439-3443. ![]() [6]Chefer H, Gur S, Wolf L, 2021. Transformer interpretability beyond attention visualization. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.782-791. ![]() [7]Chen DF, Mei JP, Zhang Y, et al., 2021. Cross-layer distillation with semantic calibration. Proc 35th AAAI Conf on Artificial Intelligence, p.7028-7036. ![]() [8]Chen GB, Choi W, Yu X, et al., 2017. Learning efficient object detection models with knowledge distillation. Proc 31st Int Conf on Neural Information Processing Systems, p.742-751. ![]() [9]Chen YC, Li LJ, Yu LC, et al., 2020. UNITER: UNiversal Image-TExt Representation learning. Proc 16th European Conf on Computer Vision, p.104-120. ![]() [10]Cordts M, Omran M, Ramos S, et al., 2016. The Cityscapes dataset for semantic urban scene understanding. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.3213-3223. ![]() [11]Cui Y, Jia ML, Lin TY, et al., 2019. Class-balanced loss based on effective number of samples. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.9268-9277. ![]() [12]Deng J, Dong W, Socher R, et al., 2009. ImageNet: a large-scale hierarchical image database. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.248-255. ![]() [13]Deng JK, Guo J, Yang J, et al., 2021. Variational prototype learning for deep face recognition. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.11906-11915. ![]() [14]Dosovitskiy A, Beyer L, Kolesnikov A, et al., 2021. An image is worth 16 × 16 words: transformers for image recognition at scale. Proc 9th Int Conf on Learning Representations. ![]() [15]Fu TJ, Li LJ, Gan Z, et al., 2023. An empirical study of end-to-end video-language transformers with masked visual modeling. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.22898-22909. ![]() [16]Gou JP, Yu BS, Maybank SJ, et al., 2021. Knowledge distillation: a survey. Int J Comput Vis, 129(6):1789-1819. ![]() [17]Gou JP, Sun LY, Yu BS, et al., 2023. Multilevel attention-based sample correlations for knowledge distillation. IEEE Trans Ind Inform, 19(5):7099-7109. ![]() [18]Heo B, Kim J, Yun S, et al., 2019a. A comprehensive overhaul of feature distillation. Proc IEEE/CVF Int Conf on Computer Vision, p.1921-1930. ![]() [19]Heo B, Lee M, Yun S, et al., 2019b. Knowledge transfer via distillation of activation boundaries formed by hidden neurons. Proc 33rd AAAI Conf on Artificial Intelligence, p.3779-3787. ![]() [20]Hinton G, Vinyals O, Dean J, 2015. Distilling the knowledge in a neural network. https://arxiv.org/abs/1503.02531 ![]() [21]Hur S, Shin I, Park K, et al., 2023. Learning classifiers of prototypes and reciprocal points for universal domain adaptation. Proc IEEE/CVF Winter Conf on Applications of Computer Vision, p.531-540. ![]() [22]Jain J, Li JC, Chiu MT, et al., 2023. OneFormer: one transformer to rule universal image segmentation. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.2989-2998. ![]() [23]Jiao XQ, Yin YC, Shang LF, et al., 2020. TinyBERT: distilling BERT for natural language understanding. Proc Findings of the Association for Computational Linguistics, p.4163-4174. ![]() [24]Kurata G, Saon G, 2020. Knowledge distillation from offline to streaming RNN transducer for end-to-end speech recognition. Proc 21st Annual Conf of the Int Speech Communication Association, p.2117-2121. ![]() [25]Li G, Jampani V, Sevilla-Lara L, et al., 2021. Adaptive prototype learning and allocation for few-shot segmentation. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.8334-8343. ![]() [26]Li LJ, Chen YC, Cheng Y, et al., 2020. HERO: hierarchical encoder for video+language omni-representation pre-training. Proc Conf on Empirical Methods in Natural Language Processing, p.2046-2065. ![]() [27]Lin TY, Maire M, Belongie S, et al., 2014. Microsoft COCO: common objects in context. Proc 13th European Conf on Computer Vision, p.740-755. ![]() [28]Liu JL, Song L, Qin YQ, 2020. Prototype rectification for few-shot learning. Proc 16th European Conf on Computer Vision, p.741-756. ![]() [29]Liu Z, Lin YT, Cao Y, et al., 2021. Swin Transformer: hierarchical vision transformer using shifted windows. Proc IEEE/CVF Int Conf on Computer Vision, p.10012-10022. ![]() [30]Miles R, Mikolajczyk K, 2024. Understanding the role of the projector in knowledge distillation. Proc 38th AAAI Conf on Artificial Intelligence, p.4233-4241. ![]() [31]Molchanov P, Tyree S, Karras T, et al., 2017. Pruning convolutional neural networks for resource efficient inference. Proc 5th Int Conf on Learning Representations. ![]() [32]Müller R, Kornblith S, Hinton G, 2019. When does label smoothing help? Proc 33rd Int Conf on Neural Information Processing Systems, Article 422. ![]() [33]Park W, Kim D, Lu Y, et al., 2019. Relational knowledge distillation. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.3967-3976. ![]() [34]Passalis N, Tefas A, 2018. Learning deep representations with probabilistic knowledge transfer. Proc 15th European Conf on Computer Vision, p.283-299. ![]() [35]Rebuffi SA, Bilen H, Vedaldi A, 2017. Learning multiple visual domains with residual adapters. Proc 31st Int Conf on Neural Information Processing Systems, p.506-516. ![]() [36]Romero A, Ballas N, Kahou SE, et al., 2015. FitNets: hints for thin deep nets. Proc 3rd Int Conf on Learning Representations. ![]() [37]Sakaridis C, Dai DX, Van Gool L, 2018. Semantic foggy scene understanding with synthetic data. Int J Comput Vis, 126(9):973-992. ![]() [38]Sanh V, Debut L, Chaumond J, et al., 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. https://arxiv.org/abs/1910.01108 ![]() [39]Shu CY, Liu YF, Gao JF, et al., 2021. Channel-wise knowledge distillation for dense prediction. Proc IEEE/CVF Int Conf on Computer Vision, p.5311-5320. ![]() [40]Snell J, Swersky K, Zemel R, 2017. Prototypical networks for few-shot learning. Proc 31st Int Conf on Neural Information Processing Systems, p.4080-4090. ![]() [41]Sun SQ, Cheng Y, Gan Z, et al., 2019. Patient knowledge distillation for BERT model compression. Proc Conf on Empirical Methods in Natural Language Processing and the 9th Int Joint Conf on Natural Language Processing, p.4322-4331. ![]() [42]Touvron H, Cord M, Douze M, et al., 2021. Training data-efficient image transformers & distillation through attention. Proc 38th Int Conf on Machine Learning, p.10347-10357. ![]() [43]van der Maaten L, Weinberger K, 2012. Stochastic triplet embedding. Proc IEEE Int Workshop on Machine Learning for Signal Processing, p.1-6. ![]() [44]Vaswani A, Shazeer N, Parmar N, et al., 2017. Attention is all you need. Proc 31st Int Conf on Neural Information Processing Systems, p.6000-6010. ![]() [45]Venkateswara H, Eusebio J, Chakraborty S, et al., 2017. Deep hashing network for unsupervised domain adaptation. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.5018-5027. ![]() [46]Wang JH, Cao MD, Shi SW, et al., 2022. Attention probe: vision transformer distillation in the wild. Proc IEEE Int Conf on Acoustics, Speech and Signal Processing, p.2220-2224. ![]() [47]Wang T, Yuan L, Zhang XP, et al., 2019. Distilling object detectors with fine-grained feature imitation. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.4933-4942. ![]() [48]Wei YJ, Ye JX, Huang ZZ, et al., 2023. Online prototype learning for online continual learning. Proc IEEE/CVF Int Conf on Computer Vision, p.18764-18774. ![]() [49]Wu AM, Liu R, Han YH, et al., 2021. Vector-decomposed disentanglement for domain-invariant object detection. Proc IEEE/CVF Int Conf on Computer Vision, p.9342-9351. ![]() [50]Wu JX, Leng C, Wang YH, et al., 2016. Quantized convolutional neural networks for mobile devices. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.4820-4828. ![]() [51]Yang ZD, Li Z, Jiang XH, et al., 2022a. Focal and global knowledge distillation for detectors. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.4643-4652. ![]() [52]Yang ZD, Li Z, Shao MQ, et al., 2022b. Masked generative distillation. Proc 17th European Conf on Computer Vision, p.53-69. ![]() [53]Ye HJ, Lu S, Zhan DC, 2020. Distilling cross-task knowledge via relationship matching. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.12396-12405. ![]() [54]Ye LW, Rochan M, Liu Z, et al., 2019. Cross-modal self-attention network for referring image segmentation. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.10502-10511. ![]() [55]Yim J, Joo D, Bae J, et al., 2017. A gift from knowledge distillation: fast optimization, network minimization and transfer learning. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.7130-7138. ![]() [56]Yoon JW, Lee H, Kim HY, et al., 2021. TutorNet: towards flexible knowledge distillation for end-to-end speech recognition. IEEE/ACM Trans Audio Speech Lang Process, 29:1626-1638. ![]() [57]Zagoruyko S, Komodakis N, 2017. Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. Proc 5th Int Conf on Learning Representations. ![]() [58]Zhang LF, Ma KS, 2023. Structured knowledge distillation for accurate and efficient object detection. IEEE Trans Patt Anal Mach Intell, 45(12):15706-15724. ![]() [59]Zhang Y, Chen WH, Lu YC, et al., 2023. Avatar knowledge distillation: self-ensemble teacher paradigm with uncertainty. Proc 31st ACM Int Conf on Multimedia, p.5272-5280. ![]() [60]Zhao BR, Cui Q, Song RJ, et al., 2022. Decoupled knowledge distillation. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.11953-11962. ![]() [61]Zhou C, Zhang YN, Chen JX, et al., 2023. OcTr: octree-based transformer for 3D object detection. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.5166-5175. ![]() [62]Zhu MH, Gupta S, 2018. To prune, or not to prune: exploring the efficacy of pruning for model compression. Proc 6th Int Conf on Learning Representations. ![]() [63]Zhu SL, Shang RH, Tang K, et al., 2023. BookKD: a novel knowledge distillation for reducing distillation costs by decoupling knowledge generation and learning. Knowl-Based Syst, 279: 110916. ![]() [64]Zhu XZ, Su WJ, Lu LW, et al., 2021. Deformable DETR: deformable transformers for end-to-end object detection. Proc 9th Int Conf on Learning Representations. ![]() Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou
310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn Copyright © 2000 - 2025 Journal of Zhejiang University-SCIENCE |
Open peer comments: Debate/Discuss/Question/Opinion
<1>