Full Text:  <445>

CLC number: 

On-line Access: 2024-08-27

Received: 2023-10-17

Revision Accepted: 2024-05-08

Crosschecked: 0000-00-00

Cited: 0

Clicked: 839

Citations:  Bibtex RefMan EndNote GB/T7714

-   Go to

Article info.
Open peer comments

Frontiers of Information Technology & Electronic Engineering 

Accepted manuscript available online (unedited version)


Training large-scale models with limited GPU memory: a survey


Author(s):  Yu TANG, Linbo QIAO, Lujia YIN, Peng LIANG, Ao SHEN, Zhilin YANG, Lizhi ZHANG, Dongsheng LI

Affiliation(s):  National University of Defense Technology, Changsha 410073, China

Corresponding email(s):  qiao.linbo@nudt.edu.cn, dsli@nudt.edu.cn

Key Words:  Training techniques; Memory optimization; Model parameters; Model states; Model activations


Share this article to: More <<< Previous Paper|Next Paper >>>

Yu TANG, Linbo QIAO, Lujia YIN, Peng LIANG, Ao SHEN, Zhilin YANG, Lizhi ZHANG, Dongsheng LI. Training large-scale models with limited GPU memory: a survey[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.2300710

@article{title="Training large-scale models with limited GPU memory: a survey",
author="Yu TANG, Linbo QIAO, Lujia YIN, Peng LIANG, Ao SHEN, Zhilin YANG, Lizhi ZHANG, Dongsheng LI",
journal="Frontiers of Information Technology & Electronic Engineering",
year="in press",
publisher="Zhejiang University Press & Springer",
doi="https://doi.org/10.1631/FITEE.2300710"
}

%0 Journal Article
%T Training large-scale models with limited GPU memory: a survey
%A Yu TANG
%A Linbo QIAO
%A Lujia YIN
%A Peng LIANG
%A Ao SHEN
%A Zhilin YANG
%A Lizhi ZHANG
%A Dongsheng LI
%J Frontiers of Information Technology & Electronic Engineering
%P
%@ 2095-9184
%D in press
%I Zhejiang University Press & Springer
doi="https://doi.org/10.1631/FITEE.2300710"

TY - JOUR
T1 - Training large-scale models with limited GPU memory: a survey
A1 - Yu TANG
A1 - Linbo QIAO
A1 - Lujia YIN
A1 - Peng LIANG
A1 - Ao SHEN
A1 - Zhilin YANG
A1 - Lizhi ZHANG
A1 - Dongsheng LI
J0 - Frontiers of Information Technology & Electronic Engineering
SP -
EP -
%@ 2095-9184
Y1 - in press
PB - Zhejiang University Press & Springer
ER -
doi="https://doi.org/10.1631/FITEE.2300710"


Abstract: 
Large-scale models have gained significant attention within a wide range of fields, such as computer vision and natural language processing, due to their effectiveness across various applications. However, a notable hurdle in training these large-scale models is the limited memory capacity of GPUs. In this paper, we present a comprehensive survey focused on training large-scale models with limited GPU memory. The exploration commences by scrutinizing the factors that contribute to the consumption of GPU memory during the training process, namely model parameters, model states, and model activations. Following this analysis, we present an in-depth overview of the relevant research work that addresses these aspects individually. Finally, the paper concludes by presenting an outlook on the future of memory optimization in training large-scale language models, emphasizing the necessity for continued research and innovation in this area. This survey serves as a valuable resource for researchers and practitioners keen on comprehending the challenges and advancements in training large-scale language models with limited GPU memory.

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - 2024 Journal of Zhejiang University-SCIENCE