
CLC number: TP391;TP18
On-line Access: 2025-11-17
Received: 2025-06-07
Revision Accepted: 2025-11-18
Crosschecked: 2025-09-03
Cited: 0
Clicked: 482
Citations: Bibtex RefMan EndNote GB/T7714
Junjie ZHANG, Shuoling LIU, Tongzhe ZHANG, Yuchen SHI. A survey on large language model-based alpha mining[J]. Frontiers of Information Technology & Electronic Engineering, 2025, 26(10): 1809-1821.
@article{title="A survey on large language model-based alpha mining",
author="Junjie ZHANG, Shuoling LIU, Tongzhe ZHANG, Yuchen SHI",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="26",
number="10",
pages="1809-1821",
year="2025",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2500386"
}
%0 Journal Article
%T A survey on large language model-based alpha mining
%A Junjie ZHANG
%A Shuoling LIU
%A Tongzhe ZHANG
%A Yuchen SHI
%J Frontiers of Information Technology & Electronic Engineering
%V 26
%N 10
%P 1809-1821
%@ 2095-9184
%D 2025
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2500386
TY - JOUR
T1 - A survey on large language model-based alpha mining
A1 - Junjie ZHANG
A1 - Shuoling LIU
A1 - Tongzhe ZHANG
A1 - Yuchen SHI
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 26
IS - 10
SP - 1809
EP - 1821
%@ 2095-9184
Y1 - 2025
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2500386
Abstract: alpha mining, which refers to the systematic discovery of data-driven signals predictive of future cross-sectional returns, is a central task in quantitative research. Recent progress in large language models (LLMs) has sparked interest in LLM-based alpha mining frameworks, which offer a promising middle ground between human-guided and fully automated alpha mining approaches and deliver both speed and semantic depth. This study presents a structured review of emerging LLM-based alpha mining systems from an agentic perspective, and analyzes the functional roles of LLMs, ranging from miners and evaluators to interactive assistants. Despite early progress, key challenges remain, including simplified performance evaluation, limited numerical understanding, lack of diversity and originality, weak exploration dynamics, temporal data leakage, and black-box risks and compliance challenges. Accordingly, we outline future directions, including improving reasoning alignment, expanding to new data modalities, rethinking evaluation protocols, and integrating LLMs into more general-purpose quantitative systems. Our analysis suggests that LLM is a scalable interface for amplifying both domain expertise and algorithmic rigor, as it amplifies domain expertise by transforming qualitative hypotheses into testable factors and enhances algorithmic rigor for rapid backtesting and semantic reasoning. The result is a complementary paradigm, where intuition, automation, and language-based reasoning converge to redefine the future of quantitative research.
[1]Anthropic, 2024. The Claude 3 Model Family: Opus, Sonnet, Haiku. Anthropic Research Report. https://assets.anthropic.com/m/61e7d27f8c8f5919/original/Claude-3-Model-Card.pdf [Accessed on June 21, 2025].
[2]Cao BK, Wang SZ, Lin XY, et al., 2025. From deep learning to LLMs: a survey of AI in quantitative investment.
[3]Cao L, Xi ZK, Liao L, et al., 2025. Chain-of-Alpha: unleashing the power of large language models for alpha mining in quantitative trading.
[4]Chen AY, Lopez-Lira A, Zimmermann T, 2022. Does peer-reviewed research help predict stock returns?
[5]Chen HL, De P, Hu Y, et al., 2014. Wisdom of crowds: the value of stock opinions transmitted through social media. Rev Fin Stud, 27(5):1367-1403.
[6]Chen HT, Shen XJ, Ye ZQ, et al., 2024. RD2Bench: toward data-centric automatic R&D. Proc 13th Int Conf on Learning Representations, p.1-22.
[7]Chen LY, Liu SL, Yan JP, et al., 2025. Advancing financial engineering with foundation models: progress, applications, and challenges.
[8]Cheng YH, Tang K, 2024. GPT's idea of stock factors. Quant Fin, 24(9):1301-1326.
[9]Cochrane JH, 2011. Presidential address: discount rates. J Fin, 66(4):1047-1108.
[10]DeepSeek-AI, Guo DY, Yang DJ, et al., 2025. DeepSeek-R1: incentivizing reasoning capability in LLMs via reinforcement learning.
[11]Ding H, Li YH, Wang JH, et al., 2024. Large language model agent in financial trading: a survey.
[12]Fama EF, French KR, 1993. Common risk factors in the returns on stocks and bonds. J Fin Econ, 33(1):3-56.
[13]Gemini Team of Google, 2024. Gemini 1.5: unlocking multimodal understanding across millions of tokens of context.
[14]Gu SH, Kelly B, Xiu DC, 2020. Empirical asset pricing via machine learning. Rev Fin Stud, 33(5):2223-2273.
[15]Guo J, Wang SZ, Ni LM, et al., 2024. Quant 4.0: engineering quantitative investment with automated, explainable, and knowledge-driven artificial intelligence. Front Inform Technol Electron Eng, 25(11):1421-1445.
[16]Harvey CR, Liu Y, Zhu HQ, 2016. ... and the cross-section of expected returns. Rev Fin Stud, 29(1):5-68.
[17]Jegadeesh N, Titman S, 1993. Returns to buying winners and selling losers: implications for stock market efficiency. J Fin, 48(1):65-91.
[18]Kent D, Lira M, Simon R, et al., 2020. The cross-section of risk and returns. Rev Fin Stud, 33(5):1927-1979.
[19]Kou ZZ, Yu H, Luo JY, et al., 2024. Automate strategy finding with LLM in quant investment.
[20]Li YT, Yang X, Yang X, et al., 2025. R&D-Agent-Quant: a multi-agent framework for data-centric factors and model joint optimization.
[21]Li ZW, Song R, Sun CH, et al., 2024. Can large language models mine interpretable financial factors more effectively? A neural-symbolic factor mining agent model. Findings of the Association for Computational Linguistics, p.3891-3902.
[22]Mehra S, Louka R, Zhang YX, 2022. ESGBERT: language model to help with classification tasks related to companies' environmental, social, and governance practices.
[23]Mirjalili S, 2019. Genetic algorithm. In: Mirjalili S (Ed.), Evolutionary Algorithms and Neural Networks: Theory and Applications. Springer, Cham, p.43-55.
[24]Nie YQ, Kong YX, Dong XW, et al., 2024. A survey of large language models for financial applications: progress, prospects and challenges.
[25]OpenAI, 2023. GPT-4 technical report.
[26]Papasotiriou K, Sood S, Reynolds S, et al., 2024. AI in investment analysis: LLMs for equity stock ratings. Proc 5th ACM Int Conf on AI in Finance, p.419-427.
[27]Real E, Liang C, So D, et al., 2020. AutoML-Zero: evolving machine learning algorithms from scratch. Proc 37th Int Conf on Machine Learning, p.8007-8019.
[28]Shi H, Song WL, Zhang XT, et al., 2025. AlphaForge: a framework to mine and dynamically combine formulaic alpha factors. Proc 39th AAAI Conf on Artificial Intelligence, p.12524-12532.
[29]Shi Y, Duan YT, Li J, 2025. Navigating the alpha jungle: an LLM-Powered MCTS framework for formulaic factor mining.
[30]Srivastava P, Malik M, Gupta V, et al., 2024. Evaluating LLMs' mathematical reasoning in financial document question answering.
[31]Su HY, Wu K, Huang YH, et al., 2024. NumLLM: numeric-sensitive large language model for Chinese finance.
[32]Tang ZY, Chen ZC, Yang JR, et al., 2025. AlphaAgent: LLM-driven alpha mining with regularized exploration to counteract alpha decay.
[33]Wang SZ, Yuan H, Zhou L, et al., 2023. Alpha-GPT: human-AI interactive alpha mining for quantitative investment.
[34]Wang SZ, Yuan H, Ni LM, et al., 2024. QuantAgent: seeking holy grail in trading by self-improving large language model.
[35]Wang YN, Zhao JM, Lawryshyn Y, 2024. GPT-signal: generative AI for semi-automated feature engineering in the alpha research process.
[36]Weng LL, 2023. LLM Powered Autonomous Agents. Lil'Log. https://lilianweng.github.io/posts/2023-06-23-agent [Accessed on June 21, 2025].
[37]Wu SJ, Irsoy O, Lu S, et al., 2023. BloomberGPT: a large language model for finance. https://arxiv.org/abs/2303.17564
[38]Xia L, Yang MM, Liu Q, 2024. Using pre-trained language model for accurate ESG prediction. Proc 8th Financial Technology and Natural Language and Proc 1st Agent AI for Scenario Planning, p.1-22. https://aclanthology.org/2024.finnlp-2.1
[39]Yang X, Chen HT, Feng WJ, et al., 2024. Collaborative evolving strategy for automatic data-centric development.
[40]Yu S, Xue HY, Ao X, et al., 2023. Generating synergistic formulaic alpha collections via reinforcement learning. Proc 29th ACM SIGKDD Conf on Knowledge Discovery and Data Mining, p.5476-5486.
[41]Yu YY, Yao ZY, Li HH, et al., 2024. FinCon: a synthesized LLM multi-agent system with conceptual verbal reinforcement for enhanced financial decision making. Proc 38th Int Conf on Neural Information Processing Systems, Article 4354.
[42]Yuan H, Wang SZ, Guo J, 2024. Alpha-GPT 2.0: human-in-the-loop AI for quantitative investment.
[43]Zhang Q, Qin C, Zhang Y, et al., 2022. Transformer-based attention network for stock movement prediction. Expert Syst Appl, 202:117239.
[44]Zhang TP, Zhang ZYA, Fan ZY, et al., 2023. OpenFE: automated feature generation with expert-level performance. Proc 40th Int Conf on Machine Learning, p.41880-41901.
Open peer comments: Debate/Discuss/Question/Opinion
<1>