JZUS - Journal of Zhejiang University SCIENCE

Frontiers of Information Technology & Electronic Engineering

ISSN 2095-9184 (print), ISSN 2095-9230 (online)

2025 Vol.26 No.3 P.309-331

Training large-scale language models with limited GPU memory: a survey

Yu TANG, Linbo QIAO, Lujia YIN, Peng LIANG, Ao SHEN, Zhilin YANG, Lizhi ZHANG, Dongsheng LI

National Key Laboratory of Parallel and Distributed Computing, College of Computer, National University of Defense Technology, Changsha 410073, China

tangyu14@nudt.edu.cn, dsli@nudt.edu.cn

Abstract: Large-scale models have gained significant attention in a wide range of fields, such as computer vision and natural language processing, due to their effectiveness across various applications. However, a notable hurdle in training these large-scale models is the limited memory capacity of graphics processing units (GPUs). In this paper, we present a comprehensive survey focused on training large-scale models with limited GPU memory. The exploration commences by scrutinizing the factors that contribute to the consumption of GPU memory during the training process, namely model parameters, model states, and model activations. Following this analysis, we present an in-depth overview of the relevant research work that addresses these aspects individually. Finally, the paper concludes by presenting an outlook on the future of memory optimization in training large-scale language models, emphasizing the necessity for continued research and innovation in this area. This survey serves as a valuable resource for researchers and practitioners keen on comprehending the challenges and advancements in training large-scale language models with limited GPU memory.

Key words: Training techniques; Memory optimization; Model parameters; Model states; Model activations

Chinese Summary <12> 有限GPU显存下的大语言模型训练技术综述

唐宇，乔林波，尹路珈，梁鹏，沈奥，杨智琳，张立志，李东升
国防科技大学计算机学院并行与分布计算全国重点实验室，中国长沙市，410073
摘要：大模型凭借其在多领域应用中的卓越性能，已在计算机视觉、自然语言处理等领域获得广泛关注。然而，此类模型的训练面临图形处理器（GPU）显存容量的显著制约。本文系统梳理了有限GPU显存条件下大模型训练的优化技术体系。首先深入解析训练过程中GPU显存占用的三大核心要素--模型参数、模型状态和模型激活；继而从这三个维度对现有研究成果进行多角度评述；最后展望了该领域未来的发展方向，强调持续创新显存优化技术对推动大语言模型发展的重要性。本综述为研究人员理解大语言模型训练中的显存优化挑战与技术演进提供了系统参考。

关键词组：训练技术；显存优化；模型参数；模型状态；模型激活

Share this article to： More

Go to Contents

References:

Open peer comments: Debate/Discuss/Question/Opinion

<1>

DOI:

10.1631/FITEE.2300710

CLC number:

TP389.1

Download Full Text:

Click Here

Downloaded:

1044

Download summary:

Downloaded:

119

Clicked:

1644

Cited:

On-line Access:

2025-04-03

Received:

2023-10-17

Revision Accepted:

2024-03-31

Crosschecked:

2025-04-07

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952276; Fax: +86-571-87952331; E-mail: jzus@zju.edu.cn
Copyright © 2000~ Journal of Zhejiang University-SCIENCE

CONTENTS

INSTR. FOR AUTHOR

FOR REVIEWER

ABOUT JZUS

Publishing Service