|
Frontiers of Information Technology & Electronic Engineering
ISSN 2095-9184 (print), ISSN 2095-9230 (online)
2025 Vol.26 No.3 P.309-331
Training large-scale language models with limited GPU memory: a survey
Abstract: Large-scale models have gained significant attention in a wide range of fields, such as computer vision and natural language processing, due to their effectiveness across various applications. However, a notable hurdle in training these large-scale models is the limited memory capacity of graphics processing units (GPUs). In this paper, we present a comprehensive survey focused on training large-scale models with limited GPU memory. The exploration commences by scrutinizing the factors that contribute to the consumption of GPU memory during the training process, namely model parameters, model states, and model activations. Following this analysis, we present an in-depth overview of the relevant research work that addresses these aspects individually. Finally, the paper concludes by presenting an outlook on the future of memory optimization in training large-scale language models, emphasizing the necessity for continued research and innovation in this area. This survey serves as a valuable resource for researchers and practitioners keen on comprehending the challenges and advancements in training large-scale language models with limited GPU memory.
Key words: Training techniques; Memory optimization; Model parameters; Model states; Model activations
国防科技大学计算机学院并行与分布计算全国重点实验室,中国长沙市,410073
摘要:大模型凭借其在多领域应用中的卓越性能,已在计算机视觉、自然语言处理等领域获得广泛关注。然而,此类模型的训练面临图形处理器(GPU)显存容量的显著制约。本文系统梳理了有限GPU显存条件下大模型训练的优化技术体系。首先深入解析训练过程中GPU显存占用的三大核心要素--模型参数、模型状态和模型激活;继而从这三个维度对现有研究成果进行多角度评述;最后展望了该领域未来的发展方向,强调持续创新显存优化技术对推动大语言模型发展的重要性。本综述为研究人员理解大语言模型训练中的显存优化挑战与技术演进提供了系统参考。
关键词组:
References:
Open peer comments: Debate/Discuss/Question/Opinion
<1>
DOI:
10.1631/FITEE.2300710
CLC number:
TP389.1
Download Full Text:
Downloaded:
1044
Download summary:
<Click Here>Downloaded:
119Clicked:
1644
Cited:
0
On-line Access:
2025-04-03
Received:
2023-10-17
Revision Accepted:
2024-03-31
Crosschecked:
2025-04-07