|
|
Frontiers of Information Technology & Electronic Engineering
ISSN 2095-9184 (print), ISSN 2095-9230 (online)
2025 Vol.26 No.10 P.1809-1821
A survey on large language model-based alpha mining
Abstract: Alpha mining, which refers to the systematic discovery of data-driven signals predictive of future cross-sectional returns, is a central task in quantitative research. Recent progress in large language models (LLMs) has sparked interest in LLM-based alpha mining frameworks, which offer a promising middle ground between human-guided and fully automated alpha mining approaches and deliver both speed and semantic depth. This study presents a structured review of emerging LLM-based alpha mining systems from an agentic perspective, and analyzes the functional roles of LLMs, ranging from miners and evaluators to interactive assistants. Despite early progress, key challenges remain, including simplified performance evaluation, limited numerical understanding, lack of diversity and originality, weak exploration dynamics, temporal data leakage, and black-box risks and compliance challenges. Accordingly, we outline future directions, including improving reasoning alignment, expanding to new data modalities, rethinking evaluation protocols, and integrating LLMs into more general-purpose quantitative systems. Our analysis suggests that LLM is a scalable interface for amplifying both domain expertise and algorithmic rigor, as it amplifies domain expertise by transforming qualitative hypotheses into testable factors and enhances algorithmic rigor for rapid backtesting and semantic reasoning. The result is a complementary paradigm, where intuition, automation, and language-based reasoning converge to redefine the future of quantitative research.
Key words: Alpha mining; Quantitative investment; Large language models (LLMs); LLM agents; Fintech
1南洋理工大学计算机与数据科学学院,新加坡,639798
2易方达资产管理有限公司,中国广州市,510000
3新加坡国立大学工业系统工程与管理系,新加坡,119077
摘要:阿尔法挖掘指系统性地发现能够预测未来截面收益的数据驱动信号,是量化研究的核心任务。近年来,大语言模型(LLM)的进展催生基于LLM的阿尔法挖掘框架,这类框架在人工指导与算法自动挖掘方法之间提供了理想的中间路径,兼具效率与语义深度。本文从智能体视角出发,对新兴的基于LLM的阿尔法挖掘系统进行结构化综述,并分析LLM在挖掘者、评估者及交互助手中的功能性角色定位。尽管初期取得进展,关键挑战依然存在,包括简化的绩效评估、有限的数值理解能力、缺乏多样性与原创性、薄弱的探索动力学、时间数据泄露以及黑箱风险与合规性挑战。据此,我们勾勒出未来的发展方向,包括提升推理一致性、拓展至新型数据模态、重新审视评估方案,以及将LLM整合到更通用的量化系统中。我们的分析表明,LLM作为可扩展的接口,既能放大领域专业知识又能增强算法严谨性,即它通过将定性假设转化为可验证因素来强化领域专业知识,同时通过支持快速回测和语义推理来提升算法严谨性。由此形成的互补范式中,直觉、自动化与基于语言的推理相互融合,共同重塑量化研究的未来。
关键词组:
References:
Open peer comments: Debate/Discuss/Question/Opinion
<1>
DOI:
10.1631/FITEE.2500386
CLC number:
TP391;TP18
Download Full Text:
Downloaded:
248
Download summary:
<Click Here>Downloaded:
45Clicked:
311
Cited:
0
On-line Access:
2025-11-17
Received:
2025-06-07
Revision Accepted:
2025-11-18
Crosschecked:
2025-09-03