
CLC number:
On-line Access: 2025-11-17
Received: 2025-06-18
Revision Accepted: 2025-11-18
Crosschecked: 2025-09-29
Cited: 0
Clicked: 636
Citations: Bibtex RefMan EndNote GB/T7714
https://orcid.org/0009-0005-9669-8582
https://orcid.org/0009-0006-3851-843X
https://orcid.org/0000-0002-5107-0338
Shurui XU, Feng LUO, Shuyan LI, Mengzhen FAN, Zhongtian SUN. Three trustworthiness challenges in large language model-based financial systems: real-world examples and mitigation strategies[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.2500421 @article{title="Three trustworthiness challenges in large language model-based financial systems: real-world examples and mitigation strategies", %0 Journal Article TY - JOUR
基于大语言模型的金融系统面临的三大可信度挑战:现实案例与缓解策略1贝尔法斯特女王大学电子、电气工程与计算机科学学院,英国贝尔法斯特,BT9 5BN 2莱斯大学计算机科学系,美国得克萨斯州休斯敦市,77005 3北京大学汇丰商学院牛津校区,英国英格兰,OX1 5HR 4肯特大学计算机学院,英国肯特郡坎特伯雷,CT2 7NZ 摘要:大语言模型(LLM)在金融应用中的集成展现出显著潜力,可提升决策流程、实现操作自动化并提供个性化服务。然而,金融系统的高风险特性要求极高的可信度,而当前LLM往往难以满足这一要求。本研究识别并探讨了基于LLM的金融系统中的3大可信度挑战:(1)逃逸式提示—利用模型对齐漏洞生成有害或违规响应;(2)幻觉现象—模型产出事实错误的输出误导金融决策;(3)偏见与公平性问题—LLM内嵌的人口统计或制度偏见可能导致个体或区域遭受不公平对待。为具体呈现这些风险,我们设计了3项金融相关测试,并对涵盖专有与开源家族的主流LLM进行评估。在所有模型中,每项测试至少出现一次风险行为。基于这些发现,系统性地总结了现有风险缓解策略。我们认为,解决这些问题不仅对确保金融领域人工智能的负责任使用至关重要,更是实现其安全可扩展部署的关键所在。 关键词组: Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article
Reference[1]Andriushchenko M, Croce F, Flammarion N, 2025. Jailbreaking leading safety-aligned LLMs with simple adaptive attacks. 13th Int Conf on Learning Representations. ![]() [2]Authority FIR, 2009. Financial Industry Regulatory Authority. https://www.kurtalawfirm.com/wp-content/uploads/2019064126802-Clearview-Trading-Advisors-Inc.-CRD-142873-Gregg-H.-Ettin-CRD-1604260-AWC-geg-2022-1670804406029.pdf [Accessed on Mar. 23, 2025]. ![]() [3]Barry M, Caillaut G, Halftermeyer P, et al., 2025. GraphRAG: leveraging graph-based efficiency to minimize hallucinations in LLM-driven RAG for finance data. Proc Workshop on Generative AI and Knowledge Graphs, p.54-65. https://hal.science/hal-04907346 ![]() [4]Bowen DEIII, Price SM, Stein LCD, et al., 2025. Measuring and Mitigating Racial Disparities in LLMs: Evidence from a Mortgage Underwriting Experiment. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4812158 [Accessed on Mar. 23, 2025]. ![]() [5]Calver J, Church P, Ford J, et al., 2024. AI in financial services—the legal and regulatory landscape. In: Law 2024. Edward Elgar Publishing, Cheltenham, p.420-458. ![]() [6]Choe J, Kim J, Jung W, 2025. Hierarchical retrieval with evidence curation for open-domain financial question answering on standardized documents. Findings of the Association for Computational Linguistics, p.16663-16681. ![]() [7]Davis HA, 2007. Summary of selected FINRA regulatory notices. J Invest Compl, 8(4):60-67. ![]() [8]Dong MM, Stratopoulos TC, Wang VX, 2024. A scoping review of ChatGPT research in accounting and finance. Int J Account Inform Syst, 55:100715. ![]() [9]Gallegos IO, Rossi RA, Barrow J, et al., 2024. Bias and fairness in large language models: a survey. Comput Linguist, 50(3):1097-1179. ![]() [10]Huang L, Yu WJ, Ma WT, et al., 2024. A survey on hallucination in large language models: principles, taxonomy, challenges, and open questions. ![]() [11]Kabra S, Jha A, Reddy CK, 2025. Reasoning towards fairness: mitigating bias in language models through reasoning-guided fine-tuning. ![]() [12]Khachaturov D, Mullins R, 2025. Adversarial suffix filtering: a defense pipeline for LLMs. ![]() [13]Kumar R, Kumar H, Shalini K, 2025. Detecting and mitigating bias in LLMs through knowledge graph-augmented training. Int Conf on Artificial Intelligence and Data Engineering, p.608-613. ![]() [14]Lee J, Stevens N, Han SC, 2025. Language models in finance (FinLLMs). Neur Comput Appl, 37:24853-24867. ![]() [15]Li XY, Chen ZP, Zhang JM, et al., 2024. Benchmarking bias in large language models during role-playing. ![]() [16]Liu XG, Xu N, Chen MH, et al., 2024. AutoDAN: generating stealthy jailbreak prompts on aligned large language models. 12th Int Conf on Learning Representations. ![]() [17]Lundberg SM, Lee SI, 2017. A unified approach to interpreting model predictions. Proc 31st Int Conf on Neural Information Processing Systems, p.4768-4777. ![]() [18]Manakul P, Liusie A, Gales MJF, 2023. SelfCheckGPT: zero-resource black-box hallucination detection for generative large language models. Proc Conf on Empirical Methods in Natural Language Processing, p.9004-9017. ![]() [19]Nakagawa K, Hirano M, Fujimoto Y, 2024. Evaluating company-specific biases in financial sentiment analysis using large language models. IEEE Int Conf on Big Data, p.6614-6623. ![]() [20]Ribeiro MT, Singh S, Guestrin C, 2016. “Why should I trust you?”: explaining the predictions of any classifier. Proc 22nd ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining, p.1135-1144. ![]() [21]Sharma M, Tong M, Mu J, et al., 2025. Constitutional classifiers: defending against universal jailbreaks across thousands of hours of red teaming. ![]() [22]Shen XY, Chen ZY, Backes M, et al., 2024. “Do anything now”: characterizing and evaluating in-the-wild jailbreak prompts on large language models. Proc ACM SIGSAC Conf on Computer and Communications Security, p.1671-1685. ![]() [23]Simpson S, Nukpezah J, Brooks K, et al., 2025. Parity benchmark for measuring bias in LLMs. AI Ethics, 5(3):3087-3101. ![]() [24]Tatsat H, Shater A, 2025. Beyond the black box: interpretability of LLMs in finance. ![]() [25]Wu Z, Wang J, Zou C, et al., 2025. Towards competent AI for fundamental analysis in finance: a benchmark dataset and evaluation. https://arxiv.org/abs/2506.07315 ![]() [26]Yan SQ, Gu JC, Zhu Y, et al., 2024. Corrective retrieval augmented generation. ![]() [27]Yu JH, Lin XW, Yu Z, et al., 2024. GPTFUZZER: red teaming large language models with auto-generated jailbreak prompts. ![]() [28]Zhang YX, Zhou F, 2024. Bias mitigation in fine-tuning pre-trained models for enhanced fairness and efficiency. ![]() [29]Zhou YJ, Han YF, Zhuang HM, et al., 2025. Defending jailbreak prompts via in-context adversarial game. ![]() [30]Zou A, Wang Z, Kolter JZ, et al., 2023. Universal and transferable adversarial attacks on aligned language models. ![]() Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou
310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn Copyright © 2000 - 2026 Journal of Zhejiang University-SCIENCE | ||||||||||||||



ORCID:
Open peer comments: Debate/Discuss/Question/Opinion
<1>