
CLC number: TP391
On-line Access: 2025-11-17
Received: 2025-04-30
Revision Accepted: 2025-11-18
Crosschecked: 2025-09-05
Cited: 0
Clicked: 722
Jiaqi SHI, Xulong ZHANG, Xiaoyang QU, Junfei XIE, Jianzong WANG. Knowledge distillation for financial large language models: a systematic review of strategies, applications, and evaluation[J]. Frontiers of Information Technology & Electronic Engineering, 2025, 26(10): 1793-1808.
@article{title="Knowledge distillation for financial large language models: a systematic review of strategies, applications, and evaluation",
author="Jiaqi SHI, Xulong ZHANG, Xiaoyang QU, Junfei XIE, Jianzong WANG",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="26",
number="10",
pages="1793-1808",
year="2025",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2500282"
}
%0 Journal Article
%T Knowledge distillation for financial large language models: a systematic review of strategies, applications, and evaluation
%A Jiaqi SHI
%A Xulong ZHANG
%A Xiaoyang QU
%A Junfei XIE
%A Jianzong WANG
%J Frontiers of Information Technology & Electronic Engineering
%V 26
%N 10
%P 1793-1808
%@ 2095-9184
%D 2025
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2500282
TY - JOUR
T1 - Knowledge distillation for financial large language models: a systematic review of strategies, applications, and evaluation
A1 - Jiaqi SHI
A1 - Xulong ZHANG
A1 - Xiaoyang QU
A1 - Junfei XIE
A1 - Jianzong WANG
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 26
IS - 10
SP - 1793
EP - 1808
%@ 2095-9184
Y1 - 2025
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2500282
Abstract: financial large language models (FinLLMs) offer immense potential for financial applications. While excessive deployment expenditures and considerable inference latency constitute major obstacles, as a prominent compression methodology, knowledge distillation (KD) offers an effective solution to these difficulties. A comprehensive survey is conducted in this work on how KD interacts with FinLLMs, covering three core aspects: strategy, application, and evaluation. At the strategy level, this review introduces a structured taxonomy to comparatively analyze existing distillation pathways. At the application level, this review puts forward a logical upstream–midstream–downstream framework to systematically explain the practical value of distilled models in the financial field. At the evaluation level, to tackle the absence of standards in the financial field, this review constructs a comprehensive evaluation framework that proceeds from multiple dimensions such as financial accuracy, reasoning fidelity, and robustness. In summary, this research aims to provide a clear roadmap for this interdisciplinary field, to accelerate the development of distilled FinLLMs.
[1]Acharya K, Velasquez A, Song HH, 2024. A survey on symbolic knowledge distillation of large language models. IEEE Trans Artif Intell, 5(12):5928-5948.
[2]Agarwal R, Vieillard N, Zhou YC, et al., 2024. On-policy distillation of language models: learning from self-generated mistakes. Proc 12th Int Conf on Learning Representations.
[3]Alvarado JCS, Verspoor K, Baldwin T, 2015. Domain adaption of named entity recognition to support credit risk assessment. Proc Australasian Language Technology Association Workshop, p.84-90.
[4]Barocas S, Hardt M, Narayanan A, 2023. Fairness and Machine Learning: Limitations and Opportunities. MIT Press, Cambridge, USA.
[5]Bhatia G, Nagoudi EMB, Cavusoglu H, et al., 2024. FinTral: a family of GPT-4 level multimodal financial large language models. Proc Findings of the Association for Computational Linguistics, p.13064-13087.
[6]Bollerslev T, 1986. Generalized autoregressive conditional heteroskedasticity. J Econom, 31(3):307-327.
[7]Brown TB, Mann B, Ryder N, et al., 2020. Language models are few-shot learners. Proc 34th Int Conf on Neural Information Processing Systems, Article 159.
[8]Burnett S, Lloyd A, 2020. Hidden and forbidden: conceptualising dark knowledge. J Doc, 76(6):1341-1358.
[9]Chang HY, Shejwalkar V, Shokri R, et al., 2019. Cronus: robust and heterogeneous collaborative learning with black-box knowledge transfer. https://arxiv.org/abs/1912.11279
[10]Chen CC, Tseng YM, Kang J, et al., 2023. Multi-lingual ESG issue identification. Proc 5th Workshop on Financial Technology and Natural Language Processing and the 2nd Multimodal AI for Financial Forecasting, p.111-115.
[11]Chen XX, Yang Y, Wang ZY, et al., 2024. Data distillation can be like vodka: distilling more times for better quality. Proc 12th Int Conf on Learning Representations.
[12]Chen ZY, Chen WH, Smiley C, et al., 2021. FinQA: a dataset of numerical reasoning over financial data. Proc Conf on Empirical Methods in Natural Language Processing, p.3697-3711.
[13]Chen ZY, Li SY, Smiley C, et al., 2022. ConvFinQA: exploring the chain of numerical reasoning in conversational finance question answering. Proc Conf on Empirical Methods in Natural Language Processing, p.6279-6292.
[14]Costantino M, Coletti P, 2008. Information Extraction in Finance. WIT Press, Billerica, USA.
[15]Daudert T, 2022. A multi-source entity-level sentiment corpus for the financial domain: the FinLin corpus. Lang Resour Eval, 56(1):333-356.
[16]De Prado ML, 2018. Advances in Financial Machine Learning. John Wiley & Sons, Hoboken, USA.
[17]Dow J, Gorton G, 1997. Stock market efficiency and economic efficiency: is there a connection? J Finance, 52(3):1087-1129.
[18]Duffie D, Pan J, 1997. An overview of value at risk. J Deriv, 4(3):7-49.
[19]Dwork C, McSherry F, Nissim K, et al., 2006. Calibrating noise to sensitivity in private data analysis. Proc 3rd Theory of Cryptography Conf, p.265-284.
[20]Feng DY, Dai YF, Huang JM, et al., 2023. Empowering many, biasing a few: generalist credit scoring through large language models. https://arxiv.org/abs/2310.00566
[21]Galichin AV, Pautov M, Zhavoronkin A, et al., 2025. GLiRA: closed-box membership inference attack via knowledge distillation. IEEE Trans Inform Forens Secur, 20:3893-3906.
[22]Gu YX, Dong L, Wei FR, et al., 2024. MiniLLM: knowledge distillation of large language models. Proc 12th Int Conf on Learning Representations.
[23]Guo C, Pleiss G, Sun Y, et al., 2017. On calibration of modern neural networks. Proc 34th Int Conf on Machine Learning, p.1321-1330.
[24]Han PC, Shi XY, Huang JW, 2024. FedAL: black-box federated knowledge distillation enabled by adversarial learning. IEEE J Sel Areas Commun, 42(11):3064-3077.
[25]Han ZY, Gao C, Liu JY, et al., 2024. Parameter-efficient fine-tuning for large models: a comprehensive survey. https://arxiv.org/abs/2403.14608
[26]Hershey JR, Olsen PA, 2007. Approximating the Kullback Leibler divergence between Gaussian mixture models. Proc IEEE Int Conf on Acoustics, Speech, and Signal Processing, p.317-320.
[27]Hristova D, Satani N, 2025. DiFiLE: a knowledge-distillation Longformer model for finance with ensembling. Proc 58th Annual Hawaii Int Conf on System Sciences, p.1585-1594. https://hdl.handle.net/10125/109031
[28]Huang AH, Wang H, Yang Y, 2023. FinBERT: a large language model for extracting information from financial text. Contemp Account Res, 40(2):806-841.
[29]Jain S, Wallace BC, 2019. Attention is not explanation. Proc Conf of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, p.3543-3556.
[30]Jensen MC, 1968. The performance of mutual funds in the period 1945-1964. J Finance, 23(2):389-416.
[31]Ji ZW, Lee N, Frieske R, et al., 2023. Survey of hallucination in natural language generation. ACM Comput Surv, 55(12):248.
[32]Jørgensen R, Brandt O, Hartmann M, et al., 2023. MultiFin: a dataset for multilingual financial NLP. Proc Findings of the Association for Computational Linguistics, p.894-909.
[33]Jorion P, 1996. Risk2: measuring the risk in value at risk. Financ Anal J, 52(6):47-56.
[34]Kaur S, Smiley C, Gupta A, et al., 2023. REFinD: relation extraction financial dataset. Proc 46th Int Conf on Research and Development in Information Retrieval, p.3054-3063.
[35]Kim M, Lee S, Lee J, et al., 2023. Token-scaled logit distillation for ternary weight generative language models. Proc 37th Int Conf on Neural Information Processing Systems, p.42097-42118.
[36]Kong YX, Nie YQ, Dong XW, et al., 2024. Large language models for financial and investment management: applications and benchmarks. J Portfolio Manage, 51(2):162-210.
[37]Lamm M, Chaganty AT, Manning CD, et al., 2018. Textual analogy parsing: what's shared and what's compared among analogous facts. Proc Conf on Empirical Methods in Natural Language Processing, p.82-92.
[38]Lee J, Stevens N, Han SC, 2025. Large language models in finance (FinLLMs). Neur Comput Appl, 37:24853-24867.
[39]Lei SY, Tao DC, 2023. A comprehensive survey of dataset distillation. IEEE Trans Pattern Anal Mach Intell, 46(1):17-32.
[40]Li JY, Tang TY, Zhao WX, et al., 2024. Pre-trained language models for text generation: a survey. ACM Comput Surv, 56(9):230.
[41]Li LJ, Dong PJ, Li AG, et al., 2023. Kd-zero: evolving knowledge distiller for any teacher–student pairs. Proc 37th Int Conf on Neural Information Processing Systems, Article 3043.
[42]Li YH, Wang SF, Ding H, et al., 2023. Large language models in finance: a survey. Proc 4th ACM Int Conf on AI in Finance, p.374-382.
[43]Li Z, Li YX, Zhao PH, et al., 2023. Is synthetic data from diffusion models ready for knowledge distillation? https://arxiv.org/abs/2305.12954
[44]Liang C, Zuo SM, Zhang QR, et al., 2023. Less is more: task-aware layer-wise distillation for language model compression. Proc 40th Int Conf on Machine Learning, p.20852-20867.
[45]Liebenwein L, Baykal C, Lang H, et al., 2020. Provable filter pruning for efficient neural networks. Proc 8th Int Conf on Learning Representations.
[46]Liu XY, Xuan W, Zha DC, 2023. FinGPT: democratizing Internet-scale data for financial large language models. https://arxiv.org/abs/2307.10485
[47]Liu ZC, Oguz B, Zhao CS, et al., 2024. LLM-QAT: data-free quantization aware training for large language models. Proc Findings of the Association for Computational Linguistics, p.467-484.
[48]Loughran T, McDonald B, 2011. When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks. J Finance, 66(1):35-65.
[49]Magdon-Ismail M, Atiya AF, 2004. Maximum drawdown. Risk Mag, 17(10):99-102.
[50]Mariko D, Abi-Akl H, Labidurie E, et al., 2020. The financial document causality detection shared task (FinCausal 2020). Proc 1st Joint Workshop on Financial Narrative Processing and MultiLing Financial Summarisation, p.23-32.
[51]Moreno-Ortiz A, Fernández-Cruz J, Pérez-Hernández C, 2020. Design and evaluation of SentiEcon: a fine-grained economic/financial sentiment lexicon from a corpus of business news. Proc 12th Language Resources and Evaluation Conf, p.5065-5072.
[52]Mukherjee R, Bohra A, Banerjee A, et al., 2022. ECTSum: a new benchmark dataset for bullet point summarization of long earnings call transcripts. Proc Conf on Empirical Methods in Natural Language Processing, p.10893-10906.
[53]Nguyen D, Gupta S, Do K, et al., 2022. Black-box few-shot knowledge distillation. Proc 17th European Conf on Computer Vision, p.196-211.
[54]Nie YQ, Kong YX, Dong XW, et al., 2024. A survey of large language models for financial applications: progress, prospects and challenges. https://arxiv.org/abs/2406.11903
[55]Pandya HA, Bhatt BS, 2021. Question answering survey: directions, challenges, datasets, evaluation matrices. https://arxiv.org/abs/2112.03572
[56]Qin CW, Xia WH, Jiao FK, et al., 2023. Beyond output matching: bidirectional alignment for enhanced in-context learning.
[57]Raza M, Jahangir Z, Riaz MB, et al., 2025. Industrial applications of large language models. Sci Rep, 15(1):13755.
[58]Ribeiro MT, Singh S, Guestrin C, 2016. “Why should I trust you?”: explaining the predictions of any classifier. Proc 22nd Int Conf on Knowledge Discovery and Data Mining, p.1135-1144.
[59]Shah A, Gullapalli A, Vithani R, et al., 2023a. FiNER-ORD: financial named entity recognition open research dataset. https://arxiv.org/abs/2302.11157
[60]Shah A, Paturi S, Chava S, 2023b. Trillion dollar words: a new financial dataset, task & market analysis. Proc 61st Annual Meeting of the Association for Computational Linguistics, p.6664-6679.
[61]Sharma S, Nayak T, Bose A, et al., 2022. FinRED: a dataset for relation extraction in financial domain. Proc 31st Companion of the Web Conf, p.595-597.
[62]Sharma S, Khatuya S, Hegde M, et al., 2023. Financial numeric extreme labelling: a dataset and benchmarking. Proc Findings of the Association for Computational Linguistics, p.3550-3561.
[63]Singh S, 2018. Natural language processing for information extraction. https://arxiv.org/abs/1807.02383
[64]Sinha A, Khandait T, 2021. Impact of news on the commodity market: dataset and results. In: Arai K (Ed.), Advances in Information and Communication. Springer, Cham, p.589-601.
[65]Sinha A, Kedas S, Kumar R, et al., 2022. SEntFiN 1.0: entity-aware sentiment analysis for financial news. J Assoc Inform Sci Technol, 73(9):1314-1335.
[66]Soun Y, Yoo J, Cho MY, et al., 2022. Accurate stock movement prediction with self-supervised learning from sparse noisy tweets. Proc Int Conf on Big Data, p.1691-1700.
[67]Sundaram JPS, Du W, Zhao Z, 2019. A survey on LoRa networking: research problems, current solutions, and open issues. IEEE Commun Surv Tutor, 22(1):371-388.
[68]Sy E, Peng TC, Huang SH, et al., 2023. Fine-grained argument understanding with BERT ensemble techniques: a deep dive into financial sentiment analysis. Proc 35th Conf on Computational Linguistics and Speech Processing, p.242-249.
[69]Tang YX, Liu ZJ, 2024. A distributed knowledge distillation framework for financial fraud detection based on Transformer. IEEE Access, 12:62899-62911.
[70]Timiryasov I, Tastet J, 2023. Baby LLaMA: knowledge distillation from an ensemble of teachers trained on a small dataset with no performance penalty. https://arxiv.org/abs/2308.02019
[71]Touvron H, Martin L, Stone K, et al., 2023. Llama 2: open foundation and fine-tuned chat models. https://arxiv.org/abs/2307.09288
[72]van Erven T, Harremos P, 2014. Rényi divergence and Kullback-Leibler divergence. IEEE Trans Inform Theory, 60(7):3797-3820.
[73]Varmedja D, Karanovic M, Sladojevic S, et al., 2019. Credit card fraud detection—machine learning methods. Proc 18th Int Symp INFOTEH-JAHORINA, p.1-5.
[74]Wan FQ, Huang XT, Cai D, et al., 2024. Knowledge fusion of large language models. Proc 12th Int Conf on Learning Representations.
[75]Wang Z, 2021. Zero-shot knowledge distillation from a decision-based black-box model. Proc 38th Int Conf on Machine Learning, p.10675-10685.
[76]Wen YQ, Li ZC, Du WY, et al., 2023. f-divergence minimization for sequence-level knowledge distillation. Proc 61st Annual Meeting of the Association for Computational, p.10817-10834.
[77]Wu HZ, Zhang W, Shen WW, et al., 2018. Hybrid deep sequential modeling for social text-driven stock prediction. Proc 27th Int Conf on Information and Knowledge Management, p.1627-1630.
[78]Wu SJ, Irsoy O, Lu S, et al., 2023. BloombergGPT: a large language model for finance. https://arxiv.org/abs/2303.17564
[79]Xie QQ, Han WG, Zhang X, et al., 2023. PIXIU: a comprehensive benchmark, instruction dataset and large language model for finance. Proc 37th Int Conf on Neural Information Processing Systems, p.33469-33484.
[80]Xie QQ, Han WG, Chen ZY, et al., 2024. FinBen: a holistic financial benchmark for large language models. Proc 38th Int Conf on Neural Information Processing Systems, p.95716-95743.
[81]Xu XH, Li M, Tao CY, et al., 2024. A survey on knowledge distillation of large language models. https://arxiv.org/abs/2402.13116
[82]Xu YM, Cohen SB, 2018. Stock movement prediction from tweets and historical prices. Proc 56th Annual Meeting of the Association for Computational Linguistics, p.1970-1979.
[83]Yang CP, Zhu Y, Lu W, et al., 2024. Survey on knowledge distillation for large language models: methods, evaluation, and application. ACM Trans Intell Syst Technol.
[84]Yang LY, Kenny EM, Ng TLJ, et al., 2020. Generating plausible counterfactual explanations for deep transformers in financial text classification. Proc 28th Int Conf on Computational Linguistics, p.6150-6160.
[85]Yang Y, Tang YX, Tam KY, 2023. InvestLM: a large language model for investment using financial domain instruction tuning. https://arxiv.org/abs/2309.13064
[86]Zhang XY, Yang Q, 2023. XuanYuan 2.0: a large Chinese financial chat model with hundreds of billions parameters. Proc 32nd Int Conf on Information and Knowledge Management, p.4435-4439.
[87]Zhao QY, Zhu BH, 2024. Towards the fundamental limits of knowledge transfer over finite domains. Proc 12th Int Conf on Learning Representations.
[88]Zhao YX, Yu B, Hui BY, et al., 2024. Tree-instruct: a preliminary study of the intrinsic relationship between complexity and alignment. Proc Joint Int Conf on Computational Linguistics, Language Resources and Evaluation, p.16776-16789.
[89]Zhao ZH, Fan WQ, Li JT, et al., 2024. Recommender systems in the era of large language models (LLMs). IEEE Trans Knowl Data Eng, 36(11):6889-6907.
[90]Zhou ZH, Ma LQ, Liu H, 2021. Trade the event: corporate events detection for news-based event-driven trading. Proc Findings of the Association for Computational Linguistics, p.2114-2124.
[91]Zhu FB, Lei WQ, Huang YC, et al., 2021. TAT-QA: a question answering benchmark on a hybrid of tabular and textual content in finance. Proc 59th Annual Meeting of the Association for Computational Linguistics and the 11th Int Joint Conf on Natural Language Processing, p.3277-3287.
Open peer comments: Debate/Discuss/Question/Opinion
<1>