CLC number:
On-line Access: 2024-04-24
Received: 2023-10-17
Revision Accepted: 2024-03-31
Crosschecked: 0000-00-00
Cited: 0
Clicked: 536
Yu TANG, Linbo QIAO, Lujia YIN, Peng LIANG, Ao SHEN, Zhilin YANG, Lizhi ZHANG, Dongsheng LI. Training large-scale models with limited GPU memory: a survey[J]. Frontiers of Information Technology & Electronic Engineering, 1998, -1(-1): .
@article{title="Training large-scale models with limited GPU memory: a survey",
author="Yu TANG, Linbo QIAO, Lujia YIN, Peng LIANG, Ao SHEN, Zhilin YANG, Lizhi ZHANG, Dongsheng LI",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="-1",
number="-1",
pages="",
year="1998",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2300710"
}
%0 Journal Article
%T Training large-scale models with limited GPU memory: a survey
%A Yu TANG
%A Linbo QIAO
%A Lujia YIN
%A Peng LIANG
%A Ao SHEN
%A Zhilin YANG
%A Lizhi ZHANG
%A Dongsheng LI
%J Journal of Zhejiang University SCIENCE C
%V -1
%N -1
%P
%@ 2095-9184
%D 1998
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2300710
TY - JOUR
T1 - Training large-scale models with limited GPU memory: a survey
A1 - Yu TANG
A1 - Linbo QIAO
A1 - Lujia YIN
A1 - Peng LIANG
A1 - Ao SHEN
A1 - Zhilin YANG
A1 - Lizhi ZHANG
A1 - Dongsheng LI
J0 - Journal of Zhejiang University Science C
VL - -1
IS - -1
SP -
EP -
%@ 2095-9184
Y1 - 1998
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2300710
Abstract: Large-scale models have gained significant attention within a wide range of fields, such as computer vision and natural language processing, due to their effectiveness across various applications. However, a notable hurdle in training these large-scale models is the limited memory capacity of GPUs. In this paper, we present a comprehensive survey focused on training large-scale models with limited GPU memory. The exploration commences by scrutinizing the factors that contribute to the consumption of GPU memory during the training process, namely model parameters, model states, and model activations. Following this analysis, we present an in-depth overview of the relevant research work that addresses these aspects individually. Finally, the paper concludes by presenting an outlook on the future of memory optimization in training large-scale language models, emphasizing the necessity for continued research and innovation in this area. This survey serves as a valuable resource for researchers and practitioners keen on comprehending the challenges and advancements in training large-scale language models with limited GPU memory.
Open peer comments: Debate/Discuss/Question/Opinion
<1>