Full Text:  <449>

CLC number: 

On-line Access: 2024-08-27

Received: 2023-10-17

Revision Accepted: 2024-05-08

Crosschecked: 0000-00-00

Cited: 0

Clicked: 660

Citations:  Bibtex RefMan EndNote GB/T7714

-   Go to

Article info.
Open peer comments

Frontiers of Information Technology & Electronic Engineering 

Accepted manuscript available online (unedited version)


Automatic parallelism strategy generation with minimal memory redundancy


Author(s):  Yanqi SHI, Peng LIANG, Hao ZHENG, Linbo QIAO, Dongsheng LI

Affiliation(s):  National University of Defense Technology, Changsha 410000, China

Corresponding email(s):  yqshi@nudt.edu.cn, peng_leung@nudt.edu.cn, zhengh@nudt.edu.cn, linboqiao@nudt.edu.cn, lds1201@163.com

Key Words:  Deep learning; Automatic parallelism; Minimal memory redundancy


Share this article to: More <<< Previous Paper|Next Paper >>>

Yanqi SHI, Peng LIANG, Hao ZHENG, Linbo QIAO, Dongsheng LI. Automatic parallelism strategy generation with minimal memory redundancy[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.2300684

@article{title="Automatic parallelism strategy generation with minimal memory redundancy",
author="Yanqi SHI, Peng LIANG, Hao ZHENG, Linbo QIAO, Dongsheng LI",
journal="Frontiers of Information Technology & Electronic Engineering",
year="in press",
publisher="Zhejiang University Press & Springer",
doi="https://doi.org/10.1631/FITEE.2300684"
}

%0 Journal Article
%T Automatic parallelism strategy generation with minimal memory redundancy
%A Yanqi SHI
%A Peng LIANG
%A Hao ZHENG
%A Linbo QIAO
%A Dongsheng LI
%J Frontiers of Information Technology & Electronic Engineering
%P
%@ 2095-9184
%D in press
%I Zhejiang University Press & Springer
doi="https://doi.org/10.1631/FITEE.2300684"

TY - JOUR
T1 - Automatic parallelism strategy generation with minimal memory redundancy
A1 - Yanqi SHI
A1 - Peng LIANG
A1 - Hao ZHENG
A1 - Linbo QIAO
A1 - Dongsheng LI
J0 - Frontiers of Information Technology & Electronic Engineering
SP -
EP -
%@ 2095-9184
Y1 - in press
PB - Zhejiang University Press & Springer
ER -
doi="https://doi.org/10.1631/FITEE.2300684"


Abstract: 
Large-scale deep learning (DL) models are trained distributedly due to memory and computing resource limitations. Few existing strategy generation approaches take optimal memory minimization as the objective. To fill this gap, we propose a novel algorithm that generates optimal parallelism strategies with the constraint of minimal memory redundancy. We propose a novel Redundant Memory Cost Model (RMCM) to calculate the memory overhead of each operator in a given parallel strategy. To generate the optimal parallelism strategy, we formulate the parallelism strategy searching problem into an integer linear programming problem and use an efficient solver to find minimal-memory intra-operator parallelism strategies. Furthermore, the proposed algorithm has been extended and implemented in a multi-dimensional parallel training framework and is characterized by the ability of high throughput and minimal memory redundancy. Experimental results demonstrate that our approach achieves significant memory savings of up to 67% compared to the latest Megatron-LM strategies, and has a similar throughput. The principal contribution of the present research lies in its provision of a novel algorithm that optimizes parallelism strategies, reducing memory redundancy in large-scale DL models. In conclusion, our paper introduces a memory-efficient algorithm for generating parallelism strategies, surpassing existing strategies in reducing memory requirements.

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - 2024 Journal of Zhejiang University-SCIENCE