
CLC number:
On-line Access: 2025-11-17
Received: 2025-04-10
Revision Accepted: 2025-04-28
Crosschecked: 2025-11-18
Cited: 0
Clicked: 1135
Citations: Bibtex RefMan EndNote GB/T7714
Shuoling LIU, Liyuan CHEN, Jiangpeng YAN, Yuhang JIANG, Xiaoyu WANG, Xiu LI, Qiang YANG. When DeepSeek-R1 meets financial applications: benchmarking, opportunities, and limitations[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.2500227 @article{title="When DeepSeek-R1 meets financial applications: benchmarking, opportunities, and limitations", %0 Journal Article TY - JOUR
DeepSeek-R1遇上金融应用:基准、机遇与局限1香港科技大学,中国香港特别行政区,999077 2易方达基金管理有限公司,中国广州市,510000 3微众银行,中国深圳市,518054 4清华大学深圳国际研究生院,中国深圳市,518055 摘要:在金融服务领域,推理型大语言模型--尤其新兴开源模型DeepSeek-R1--的潜在价值仍处于初步探索阶段。尽管通用大语言模型已在金融新闻分析、客户交互等场景实现较多应用,但DeepSeek-R1凭借集成强化学习的多阶段训练机制,突破性地解锁了高级推理能力,不仅能精准应对复杂金融问答任务,还针对资源受限环境推出轻量级蒸馏学生模型,显著提升部署灵活性。本文以跨学科视角切入金融人工智能领域,首先系统剖析DeepSeek-R1的技术架构与核心原理,随后基于两个公开金融问答数据集,对DeepSeek-R1及其蒸馏模型开展初步但全面的性能基准测试。在此基础上,深入探讨该模型为金融服务带来的创新机遇,客观分析其现存局限性,并前瞻性地提出3个未来研究方向。本文旨在为推理型大语言模型在金融人工智能领域的合理应用与深度发展提供理论依据与实践指引,推动金融科技迈向更高层次。 关键词组: Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article
Reference[1]Achiam J, Adler S, Agarwal S, et al., 2023. GPT-4 technical report. https://arxiv.org/abs/2303.08774 ![]() [2]Bai JZ, Bai S, Chu YF, et al., 2023. Qwen technical report. https://arxiv.org/abs/2309.16609 ![]() [3]Bai YT, Jones A, Ndousse K, et al., 2022. Training a helpful and harmless assistant with reinforcement learning from human feedback. https://arxiv.org/abs/2204.05862 ![]() [4]Besta M, Blach N, Kubicek A, et al., 2024. Graph of thoughts: solving elaborate problems with large language models. Proc 38th AAAI Conf on Artificial Intelligence, p.17682-17690. ![]() [5]Brown TB, Mann B, Ryder N, et al., 2020. Language models are few-shot learners. Proc 34th Int Conf on Neural Information Processing Systems, Article 159. ![]() [6]Chang YP, Wang X, Wang JD, et al., 2024. A survey on evaluation of large language models. ACM Trans Intell Syst Technol, 15(3):39. ![]() [7]Chervonyi Y, Trinh TH, Olšák M, et al., 2025. Gold-medalist performance in solving Olympiad geometry with AlphaGeometry2. https://arxiv.org/abs/2502.03544 ![]() [8]Daiya D, Lin C, 2021. Stock movement prediction and portfolio management via multimodal learning with Transformer. Proc IEEE Int Conf on Acoustics, Speech and Signal Processing, p.3305-3309. ![]() [9]Fan T, Kang Y, Ma GQ, et al., 2023. FATE-LLM: an industrial grade federated learning framework for large language models. https://arxiv.org/abs/2310.10049 ![]() [10]Gemini Team of Google, 2023. Gemini: a family of highly capable multimodal models. https://arxiv.org/abs/2312.11805 ![]() [11]Guo DY, Yang DJ, Zhang HW, et al., 2025. DeepSeek-R1: incentivizing reasoning capability in LLMs via reinforcement learning. https://arxiv.org/abs/2501.12948 ![]() [12]Guo X, Xia HT, Liu ZW, et al., 2023. FinEval: a Chinese financial domain knowledge evaluation benchmark for large language models. https://arxiv.org/abs/2308.09975 ![]() [13]Hinton G, Vinyals O, Dean J, 2015. Distilling the knowledge in a neural network. https://arxiv.org/abs/1503.02531 ![]() [14]Hua WY, Zhang YF, 2022. System 1+system 2=better world: neural-symbolic chain of logic reasoning. Proc Findings of the Association for Computational Linguistics, p.601-612. ![]() [15]Huang L, Yu WJ, Ma WT, et al., 2025. A survey on hallucination in large language models: principles, taxonomy, challenges, and open questions. ACM Trans Inform Syst, 43(2):42. ![]() [16]Jaech A, Kalai A, Lerer A, et al., 2024. OpenAI o1 system card. https://arxiv.org/abs/2412.16720 ![]() [17]Kaplan J, McCandlish S, Henighan T, et al., 2020. Scaling laws for neural language models. https://arxiv.org/abs/2001.08361 ![]() [18]Kojima T, Gu SS, Reid M, et al., 2022. Large language models are zero-shot reasoners. Proc 36th Int Conf on Neural Information Processing Systems, Article 1613. ![]() [19]Kuang WR, Qian BC, Li ZT, et al., 2024. FederatedScope-LLM: a comprehensive package for fine-tuning large language models in federated learning. Proc 30th ACM SIGKDD Conf on Knowledge Discovery and Data Mining, p.5260-5271. ![]() [20]Lightman H, Kosaraju V, Burda Y, et al., 2024. Let’s verify step by step. Proc 12th Int Conf on Learning Representations, p.1-24. ![]() [21]Liu AX, Feng B, Xue B, et al., 2024. DeepSeek-V3 technical report. https://arxiv.org/abs/2412.19437 ![]() [22]Liu Y, Yao YS, Ton JF, et al., 2023. Trustworthy LLMs: a survey and guideline for evaluating large language models’ alignment. https://arxiv.org/abs/2308.05374 ![]() [23]Lusardi A, Mitchell OS, 2014. The economic importance of financial literacy: theory and evidence. J Econ Liter, 52(1):5-44. ![]() [24]Miao XP, Oliaro G, Zhang ZH, et al., 2024. SpecInfer: accelerating large language model serving with tree-based speculative inference and verification. Proc 29th ACM Int Conf on Architectural Support for Programming Languages and Operating Systems, p.932-949. ![]() [25]Peng YF, Malin BA, Rousseau JF, et al., 2025. From GPT to DeepSeek: significant gaps remain in realizing AI in healthcare. J Biomed Inform, 163:104791. ![]() [26]Qian LF, Zhou WP, Wang Y, et al., 2025. Fino1: on the transferability of reasoning enhanced LLMs to finance. https://arxiv.org/abs/2502.08127 ![]() [27]Shao ZH, Wang PY, Zhu QH, et al., 2024. DeepSeekMath: pushing the limits of mathematical reasoning in open language models. https://arxiv.org/abs/2402.03300 ![]() [28]Sun GY, Jin MY, Wang ZT, et al., 2024. Visual agents as fast and slow thinkers. https://arxiv.org/abs/2408.08862 ![]() [29]Touvron H, Martin L, Stone K, et al., 2023. Llama 2: open foundation and fine-tuned chat models. https://arxiv.org/abs/2307.09288 ![]() [30]Vaswani A, Shazeer N, Parmar N, et al., 2017. Attention is all you need. Proc 31st Int Conf on Neural Information Processing Systems, p.6000-6010. ![]() [31]Webb T, Holyoak KJ, Lu HJ, 2023. Emergent analogical reasoning in large language models. Nat Human Behav, 7(9):1526-1541. ![]() [32]Wei J, Wang XZ, Schuurmans D, et al., 2022. Chain-of-thought prompting elicits reasoning in large language models. Proc 36th Int Conf on Neural Information Processing Systems, Article 1800. ![]() [33]Wu SJ, Irsoy O, Lu S, et al., 2023. BloombergGPT: a large language model for finance. https://arxiv.org/abs/2303.17564 ![]() [34]Yao SY, Yu D, Zhao J, et al., 2023. Tree of thoughts: deliberate problem solving with large language models. Proc 37th Int Conf on Neural Information Processing Systems, Article 517. ![]() [35]Zelikman E, Harik G, Shao Y, et al., 2024. Quiet-STaR: language models can teach themselves to think before speaking. https://arxiv.org/abs/2403.09629 ![]() [36]Zhang WT, Zhao LX, Xia HC, et al., 2024. A multimodal foundation agent for financial trading: tool-augmented, diversified, and generalist. Proc 30th ACM SIGKDD Conf on Knowledge Discovery and Data Mining, p.4314-4325. ![]() [37]Zhang XY, Yang Q, 2023. XuanYuan 2.0: a large Chinese financial chat model with hundreds of billions parameters. Proc 32nd ACM Int Conf on Information and Knowledge Management, p.4435-4439. ![]() [38]Zhou J, Ke P, Qiu XP, et al., 2024. ChatGPT: potential, prospects, and limitations. Front Inform Technol Electron Eng, 25(1):6-11. ![]() Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou
310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn Copyright © 2000 - 2026 Journal of Zhejiang University-SCIENCE | ||||||||||||||


ORCID:
Open peer comments: Debate/Discuss/Question/Opinion
<1>