JZUS - Journal of Zhejiang University SCIENCE

Frontiers of Information Technology & Electronic Engineering 2025 Vol.26 No.10 P.1793-1808

Knowledge distillation for financial large language models: a systematic review of strategies, applications, and evaluation

Author(s): Jiaqi SHI, Xulong ZHANG, Xiaoyang QU, Junfei XIE, Jianzong WANG
Affiliation(s): Ping An Technology (Shenzhen) Co., Ltd., Shenzhen 518046, China; more
Corresponding email(s): civilizwa@mail.ustc.edu.cn, zhangxulong@ieee.org, quxiaoy@gmail.com, xiejunfei@mail.ustc.edu.cn, jzwang@188.com
Key Words: Financial large language models (FinLLMs), Knowledge distillation, Model compression, Quantitative trading

Share this article to： More <<< Previous Article \|Next Article >>>

Jiaqi SHI, Xulong ZHANG, Xiaoyang QU, Junfei XIE, Jianzong WANG. Knowledge distillation for financial large language models: a systematic review of strategies, applications, and evaluation[J]. Frontiers of Information Technology & Electronic Engineering, 2025, 26(10): 1793-1808.

@article{title="Knowledge distillation for financial large language models: a systematic review of strategies, applications, and evaluation",
author="Jiaqi SHI, Xulong ZHANG, Xiaoyang QU, Junfei XIE, Jianzong WANG",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="26",
number="10",
pages="1793-1808",
year="2025",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2500282"
}

%0 Journal Article
%T Knowledge distillation for financial large language models: a systematic review of strategies, applications, and evaluation
%A Jiaqi SHI
%A Xulong ZHANG
%A Xiaoyang QU
%A Junfei XIE
%A Jianzong WANG
%J Frontiers of Information Technology & Electronic Engineering
%V 26
%N 10
%P 1793-1808
%@ 2095-9184
%D 2025
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2500282

TY - JOUR
T1 - Knowledge distillation for financial large language models: a systematic review of strategies, applications, and evaluation
A1 - Jiaqi SHI
A1 - Xulong ZHANG
A1 - Xiaoyang QU
A1 - Junfei XIE
A1 - Jianzong WANG
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 26
IS - 10
SP - 1793
EP - 1808
%@ 2095-9184
Y1 - 2025
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2500282

Abstract
Chinese Summary
Academic Network
Reviewer Comment

Abstract: financial large language models (FinLLMs) offer immense potential for financial applications. While excessive deployment expenditures and considerable inference latency constitute major obstacles, as a prominent compression methodology, knowledge distillation (KD) offers an effective solution to these difficulties. A comprehensive survey is conducted in this work on how KD interacts with FinLLMs, covering three core aspects: strategy, application, and evaluation. At the strategy level, this review introduces a structured taxonomy to comparatively analyze existing distillation pathways. At the application level, this review puts forward a logical upstream–midstream–downstream framework to systematically explain the practical value of distilled models in the financial field. At the evaluation level, to tackle the absence of standards in the financial field, this review constructs a comprehensive evaluation framework that proceeds from multiple dimensions such as financial accuracy, reasoning fidelity, and robustness. In summary, this research aims to provide a clear roadmap for this interdisciplinary field, to accelerate the development of distilled FinLLMs.

金融大语言模型知识蒸馏：策略、应用与评估的系统综述

石家琪^1,2，张旭龙¹，瞿晓阳¹，谢骏飞^1,2，王健宗¹
¹平安科技（深圳）有限公司，中国深圳市，518046
²中国科学技术大学先进技术研究院，中国合肥市，230027
摘要：金融大语言模型为金融应用提供了巨大潜力。然而，过高的部署成本和巨大的推理延迟构成了主要障碍。作为一种重要压缩方法，知识蒸馏为这些难题提供了有效解决方案。本文对知识蒸馏如何与金融大语言模型相互作用进行了全面调查，涵盖了策略、应用和评估3个核心方面。在策略层面，引入一个结构化分类法，以比较分析现有蒸馏路径。在应用层面，提出一个逻辑的上游–中游–下游框架，系统地解释蒸馏模型在金融领域的实际价值。在评估层面，为解决金融领域缺乏标准的问题，构建了一个综合评估框架，从金融准确性、推理保真度和稳健性等多个维度进行评估。总而言之，本文旨在为这一跨学科领域提供清晰的路线图，以加速蒸馏型金融大模型发展。

关键词：金融大语言模型；知识蒸馏；模型压缩；量化交易

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Acharya K, Velasquez A, Song HH, 2024. A survey on symbolic knowledge distillation of large language models. IEEE Trans Artif Intell, 5(12):5928-5948.

[2]Agarwal R, Vieillard N, Zhou YC, et al., 2024. On-policy distillation of language models: learning from self-generated mistakes. Proc 12^th Int Conf on Learning Representations.

[3]Alvarado JCS, Verspoor K, Baldwin T, 2015. Domain adaption of named entity recognition to support credit risk assessment. Proc Australasian Language Technology Association Workshop, p.84-90.

[4]Barocas S, Hardt M, Narayanan A, 2023. Fairness and Machine Learning: Limitations and Opportunities. MIT Press, Cambridge, USA.

[5]Bhatia G, Nagoudi EMB, Cavusoglu H, et al., 2024. FinTral: a family of GPT-4 level multimodal financial large language models. Proc Findings of the Association for Computational Linguistics, p.13064-13087.

[6]Bollerslev T, 1986. Generalized autoregressive conditional heteroskedasticity. J Econom, 31(3):307-327.

[7]Brown TB, Mann B, Ryder N, et al., 2020. Language models are few-shot learners. Proc 34^th Int Conf on Neural Information Processing Systems, Article 159.

[8]Burnett S, Lloyd A, 2020. Hidden and forbidden: conceptualising dark knowledge. J Doc, 76(6):1341-1358.

[9]Chang HY, Shejwalkar V, Shokri R, et al., 2019. Cronus: robust and heterogeneous collaborative learning with black-box knowledge transfer. https://arxiv.org/abs/1912.11279

[10]Chen CC, Tseng YM, Kang J, et al., 2023. Multi-lingual ESG issue identification. Proc 5^th Workshop on Financial Technology and Natural Language Processing and the 2^nd Multimodal AI for Financial Forecasting, p.111-115.

[11]Chen XX, Yang Y, Wang ZY, et al., 2024. Data distillation can be like vodka: distilling more times for better quality. Proc 12^th Int Conf on Learning Representations.

[12]Chen ZY, Chen WH, Smiley C, et al., 2021. FinQA: a dataset of numerical reasoning over financial data. Proc Conf on Empirical Methods in Natural Language Processing, p.3697-3711.

[13]Chen ZY, Li SY, Smiley C, et al., 2022. ConvFinQA: exploring the chain of numerical reasoning in conversational finance question answering. Proc Conf on Empirical Methods in Natural Language Processing, p.6279-6292.

[14]Costantino M, Coletti P, 2008. Information Extraction in Finance. WIT Press, Billerica, USA.

[15]Daudert T, 2022. A multi-source entity-level sentiment corpus for the financial domain: the FinLin corpus. Lang Resour Eval, 56(1):333-356.

[16]De Prado ML, 2018. Advances in Financial Machine Learning. John Wiley & Sons, Hoboken, USA.

[17]Dow J, Gorton G, 1997. Stock market efficiency and economic efficiency: is there a connection? J Finance, 52(3):1087-1129.

[18]Duffie D, Pan J, 1997. An overview of value at risk. J Deriv, 4(3):7-49.

[19]Dwork C, McSherry F, Nissim K, et al., 2006. Calibrating noise to sensitivity in private data analysis. Proc 3^rd Theory of Cryptography Conf, p.265-284.

[20]Feng DY, Dai YF, Huang JM, et al., 2023. Empowering many, biasing a few: generalist credit scoring through large language models. https://arxiv.org/abs/2310.00566

[21]Galichin AV, Pautov M, Zhavoronkin A, et al., 2025. GLiRA: closed-box membership inference attack via knowledge distillation. IEEE Trans Inform Forens Secur, 20:3893-3906.

[22]Gu YX, Dong L, Wei FR, et al., 2024. MiniLLM: knowledge distillation of large language models. Proc 12^th Int Conf on Learning Representations.

[23]Guo C, Pleiss G, Sun Y, et al., 2017. On calibration of modern neural networks. Proc 34^th Int Conf on Machine Learning, p.1321-1330.

[24]Han PC, Shi XY, Huang JW, 2024. FedAL: black-box federated knowledge distillation enabled by adversarial learning. IEEE J Sel Areas Commun, 42(11):3064-3077.

[25]Han ZY, Gao C, Liu JY, et al., 2024. Parameter-efficient fine-tuning for large models: a comprehensive survey. https://arxiv.org/abs/2403.14608

[26]Hershey JR, Olsen PA, 2007. Approximating the Kullback Leibler divergence between Gaussian mixture models. Proc IEEE Int Conf on Acoustics, Speech, and Signal Processing, p.317-320.

[27]Hristova D, Satani N, 2025. DiFiLE: a knowledge-distillation Longformer model for finance with ensembling. Proc 58^th Annual Hawaii Int Conf on System Sciences, p.1585-1594. https://hdl.handle.net/10125/109031

[28]Huang AH, Wang H, Yang Y, 2023. FinBERT: a large language model for extracting information from financial text. Contemp Account Res, 40(2):806-841.

[29]Jain S, Wallace BC, 2019. Attention is not explanation. Proc Conf of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, p.3543-3556.

[30]Jensen MC, 1968. The performance of mutual funds in the period 1945-1964. J Finance, 23(2):389-416.

[31]Ji ZW, Lee N, Frieske R, et al., 2023. Survey of hallucination in natural language generation. ACM Comput Surv, 55(12):248.

[32]Jørgensen R, Brandt O, Hartmann M, et al., 2023. MultiFin: a dataset for multilingual financial NLP. Proc Findings of the Association for Computational Linguistics, p.894-909.

[33]Jorion P, 1996. Risk2: measuring the risk in value at risk. Financ Anal J, 52(6):47-56.

[34]Kaur S, Smiley C, Gupta A, et al., 2023. REFinD: relation extraction financial dataset. Proc 46^th Int Conf on Research and Development in Information Retrieval, p.3054-3063.

[35]Kim M, Lee S, Lee J, et al., 2023. Token-scaled logit distillation for ternary weight generative language models. Proc 37^th Int Conf on Neural Information Processing Systems, p.42097-42118.

[36]Kong YX, Nie YQ, Dong XW, et al., 2024. Large language models for financial and investment management: applications and benchmarks. J Portfolio Manage, 51(2):162-210.

[37]Lamm M, Chaganty AT, Manning CD, et al., 2018. Textual analogy parsing: what's shared and what's compared among analogous facts. Proc Conf on Empirical Methods in Natural Language Processing, p.82-92.

[38]Lee J, Stevens N, Han SC, 2025. Large language models in finance (FinLLMs). Neur Comput Appl, 37:24853-24867.

[39]Lei SY, Tao DC, 2023. A comprehensive survey of dataset distillation. IEEE Trans Pattern Anal Mach Intell, 46(1):17-32.

[40]Li JY, Tang TY, Zhao WX, et al., 2024. Pre-trained language models for text generation: a survey. ACM Comput Surv, 56(9):230.

[41]Li LJ, Dong PJ, Li AG, et al., 2023. Kd-zero: evolving knowledge distiller for any teacher–student pairs. Proc 37^th Int Conf on Neural Information Processing Systems, Article 3043.

[42]Li YH, Wang SF, Ding H, et al., 2023. Large language models in finance: a survey. Proc 4^th ACM Int Conf on AI in Finance, p.374-382.

[43]Li Z, Li YX, Zhao PH, et al., 2023. Is synthetic data from diffusion models ready for knowledge distillation? https://arxiv.org/abs/2305.12954

[44]Liang C, Zuo SM, Zhang QR, et al., 2023. Less is more: task-aware layer-wise distillation for language model compression. Proc 40^th Int Conf on Machine Learning, p.20852-20867.

[45]Liebenwein L, Baykal C, Lang H, et al., 2020. Provable filter pruning for efficient neural networks. Proc 8^th Int Conf on Learning Representations.

[46]Liu XY, Xuan W, Zha DC, 2023. FinGPT: democratizing Internet-scale data for financial large language models. https://arxiv.org/abs/2307.10485

[47]Liu ZC, Oguz B, Zhao CS, et al., 2024. LLM-QAT: data-free quantization aware training for large language models. Proc Findings of the Association for Computational Linguistics, p.467-484.

[48]Loughran T, McDonald B, 2011. When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks. J Finance, 66(1):35-65.

[49]Magdon-Ismail M, Atiya AF, 2004. Maximum drawdown. Risk Mag, 17(10):99-102.

[50]Mariko D, Abi-Akl H, Labidurie E, et al., 2020. The financial document causality detection shared task (FinCausal 2020). Proc 1^st Joint Workshop on Financial Narrative Processing and MultiLing Financial Summarisation, p.23-32.

[51]Moreno-Ortiz A, Fernández-Cruz J, Pérez-Hernández C, 2020. Design and evaluation of SentiEcon: a fine-grained economic/financial sentiment lexicon from a corpus of business news. Proc 12^th Language Resources and Evaluation Conf, p.5065-5072.

[52]Mukherjee R, Bohra A, Banerjee A, et al., 2022. ECTSum: a new benchmark dataset for bullet point summarization of long earnings call transcripts. Proc Conf on Empirical Methods in Natural Language Processing, p.10893-10906.

[53]Nguyen D, Gupta S, Do K, et al., 2022. Black-box few-shot knowledge distillation. Proc 17^th European Conf on Computer Vision, p.196-211.

[54]Nie YQ, Kong YX, Dong XW, et al., 2024. A survey of large language models for financial applications: progress, prospects and challenges. https://arxiv.org/abs/2406.11903

[55]Pandya HA, Bhatt BS, 2021. Question answering survey: directions, challenges, datasets, evaluation matrices. https://arxiv.org/abs/2112.03572

[56]Qin CW, Xia WH, Jiao FK, et al., 2023. Beyond output matching: bidirectional alignment for enhanced in-context learning.

[57]Raza M, Jahangir Z, Riaz MB, et al., 2025. Industrial applications of large language models. Sci Rep, 15(1):13755.

[58]Ribeiro MT, Singh S, Guestrin C, 2016. “Why should I trust you?”: explaining the predictions of any classifier. Proc 22^nd Int Conf on Knowledge Discovery and Data Mining, p.1135-1144.

[59]Shah A, Gullapalli A, Vithani R, et al., 2023a. FiNER-ORD: financial named entity recognition open research dataset. https://arxiv.org/abs/2302.11157

[60]Shah A, Paturi S, Chava S, 2023b. Trillion dollar words: a new financial dataset, task & market analysis. Proc 61^st Annual Meeting of the Association for Computational Linguistics, p.6664-6679.

[61]Sharma S, Nayak T, Bose A, et al., 2022. FinRED: a dataset for relation extraction in financial domain. Proc 31^st Companion of the Web Conf, p.595-597.

[62]Sharma S, Khatuya S, Hegde M, et al., 2023. Financial numeric extreme labelling: a dataset and benchmarking. Proc Findings of the Association for Computational Linguistics, p.3550-3561.

[63]Singh S, 2018. Natural language processing for information extraction. https://arxiv.org/abs/1807.02383

[64]Sinha A, Khandait T, 2021. Impact of news on the commodity market: dataset and results. In: Arai K (Ed.), Advances in Information and Communication. Springer, Cham, p.589-601.

[65]Sinha A, Kedas S, Kumar R, et al., 2022. SEntFiN 1.0: entity-aware sentiment analysis for financial news. J Assoc Inform Sci Technol, 73(9):1314-1335.

[66]Soun Y, Yoo J, Cho MY, et al., 2022. Accurate stock movement prediction with self-supervised learning from sparse noisy tweets. Proc Int Conf on Big Data, p.1691-1700.

[67]Sundaram JPS, Du W, Zhao Z, 2019. A survey on LoRa networking: research problems, current solutions, and open issues. IEEE Commun Surv Tutor, 22(1):371-388.

[68]Sy E, Peng TC, Huang SH, et al., 2023. Fine-grained argument understanding with BERT ensemble techniques: a deep dive into financial sentiment analysis. Proc 35^th Conf on Computational Linguistics and Speech Processing, p.242-249.

[69]Tang YX, Liu ZJ, 2024. A distributed knowledge distillation framework for financial fraud detection based on Transformer. IEEE Access, 12:62899-62911.

[70]Timiryasov I, Tastet J, 2023. Baby LLaMA: knowledge distillation from an ensemble of teachers trained on a small dataset with no performance penalty. https://arxiv.org/abs/2308.02019

[71]Touvron H, Martin L, Stone K, et al., 2023. Llama 2: open foundation and fine-tuned chat models. https://arxiv.org/abs/2307.09288

[72]van Erven T, Harremos P, 2014. Rényi divergence and Kullback-Leibler divergence. IEEE Trans Inform Theory, 60(7):3797-3820.

[73]Varmedja D, Karanovic M, Sladojevic S, et al., 2019. Credit card fraud detection—machine learning methods. Proc 18^th Int Symp INFOTEH-JAHORINA, p.1-5.

[74]Wan FQ, Huang XT, Cai D, et al., 2024. Knowledge fusion of large language models. Proc 12^th Int Conf on Learning Representations.

[75]Wang Z, 2021. Zero-shot knowledge distillation from a decision-based black-box model. Proc 38^th Int Conf on Machine Learning, p.10675-10685.

[76]Wen YQ, Li ZC, Du WY, et al., 2023. f-divergence minimization for sequence-level knowledge distillation. Proc 61^st Annual Meeting of the Association for Computational, p.10817-10834.

[77]Wu HZ, Zhang W, Shen WW, et al., 2018. Hybrid deep sequential modeling for social text-driven stock prediction. Proc 27^th Int Conf on Information and Knowledge Management, p.1627-1630.

[78]Wu SJ, Irsoy O, Lu S, et al., 2023. BloombergGPT: a large language model for finance. https://arxiv.org/abs/2303.17564

[79]Xie QQ, Han WG, Zhang X, et al., 2023. PIXIU: a comprehensive benchmark, instruction dataset and large language model for finance. Proc 37^th Int Conf on Neural Information Processing Systems, p.33469-33484.

[80]Xie QQ, Han WG, Chen ZY, et al., 2024. FinBen: a holistic financial benchmark for large language models. Proc 38^th Int Conf on Neural Information Processing Systems, p.95716-95743.

[81]Xu XH, Li M, Tao CY, et al., 2024. A survey on knowledge distillation of large language models. https://arxiv.org/abs/2402.13116

[82]Xu YM, Cohen SB, 2018. Stock movement prediction from tweets and historical prices. Proc 56^th Annual Meeting of the Association for Computational Linguistics, p.1970-1979.

[83]Yang CP, Zhu Y, Lu W, et al., 2024. Survey on knowledge distillation for large language models: methods, evaluation, and application. ACM Trans Intell Syst Technol.

[84]Yang LY, Kenny EM, Ng TLJ, et al., 2020. Generating plausible counterfactual explanations for deep transformers in financial text classification. Proc 28^th Int Conf on Computational Linguistics, p.6150-6160.

[85]Yang Y, Tang YX, Tam KY, 2023. InvestLM: a large language model for investment using financial domain instruction tuning. https://arxiv.org/abs/2309.13064

[86]Zhang XY, Yang Q, 2023. XuanYuan 2.0: a large Chinese financial chat model with hundreds of billions parameters. Proc 32^nd Int Conf on Information and Knowledge Management, p.4435-4439.

[87]Zhao QY, Zhu BH, 2024. Towards the fundamental limits of knowledge transfer over finite domains. Proc 12^th Int Conf on Learning Representations.

[88]Zhao YX, Yu B, Hui BY, et al., 2024. Tree-instruct: a preliminary study of the intrinsic relationship between complexity and alignment. Proc Joint Int Conf on Computational Linguistics, Language Resources and Evaluation, p.16776-16789.

[89]Zhao ZH, Fan WQ, Li JT, et al., 2024. Recommender systems in the era of large language models (LLMs). IEEE Trans Knowl Data Eng, 36(11):6889-6907.

[90]Zhou ZH, Ma LQ, Liu H, 2021. Trade the event: corporate events detection for news-based event-driven trading. Proc Findings of the Association for Computational Linguistics, p.2114-2124.

[91]Zhu FB, Lei WQ, Huang YC, et al., 2021. TAT-QA: a question answering benchmark on a hybrid of tabular and textual content in finance. Proc 59^th Annual Meeting of the Association for Computational Linguistics and the 11^th Int Joint Conf on Natural Language Processing, p.3277-3287.

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Similar articles

- Go to

金融大语言模型知识蒸馏：策略、应用与评估的系统综述

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference