JZUS - Journal of Zhejiang University SCIENCE

Frontiers of Information Technology & Electronic Engineering

ISSN 2095-9184 (print), ISSN 2095-9230 (online)

2025 Vol.26 No.1 P.109-118

Automatic parallelism strategy generation with minimal memory redundancy

Yanqi SHI, Peng LIANG, Hao ZHENG, Linbo QIAO, Dongsheng LI

National Key Laboratory of Parallel and Distributed Computing, National University of Defense Technology, Changsha 410000, China

yqshi@nudt.edu.cn, peng_leung@nudt.edu.cn, zhengh@nudt.edu.cn, linboqiao@nudt.edu.cn, lds1201@163.com

Abstract: Large-scale deep learning models are trained distributedly due to memory and computing resource limitations. Few existing strategy generation approaches take optimal memory minimization as the objective. To fill in this gap, we propose a novel algorithm that generates optimal parallelism strategies with the constraint of minimal memory redundancy. We propose a novel redundant memory cost model to calculate the memory overhead of each operator in a given parallel strategy. To generate the optimal parallelism strategy, we formulate the parallelism strategy search problem into an integer linear programming problem and use an efficient solver to find minimal-memory intra-operator parallelism strategies. Furthermore, the proposed algorithm has been extended and implemented in a multi-dimensional parallel training framework and is characterized by high throughput and minimal memory redundancy. Experimental results demonstrate that our approach achieves memory savings of up to 67% compared to the latest Megatron-LM strategies; in contrast, the gap between the throughput of our approach and its counterparts is not large.

Key words: Deep learning; Automatic parallelism; Minimal memory redundancy

Chinese Summary <19> 最小化内存冗余的自动并行策略生成方法

时彦琦，梁鹏，郑浩，乔林波，李东升
国防科技大学并行与分布处理国家重点实验室，中国长沙市，410000
摘要：受内存和计算资源限制，大规模深度学习模型通常以分布式方式训练。现有策略生成方法很少以最小化内存占用作为目标。为此，提出一种新算法，能够生成以最小化内存冗余为目标的自动并行策略。提出一种冗余内存代价模型来计算给定并行策略中每个算子的内存开销。为确保生成最优的并行策略，将并行策略搜索问题形式化为整数线性规划问题，使用高效求解器寻找具有最小内存占用的算子内并行策略。所提方法在多维并行训练框架中实现；实验结果表明，与最新Megatron-LM方法相比，可节省高达67%的内存开销，而吞吐量相差不大。

关键词组：深度学习；自动并行；最小化内存冗余

Share this article to： More

Go to Contents

References:

Open peer comments: Debate/Discuss/Question/Opinion

<1>

DOI:

10.1631/FITEE.2300684

CLC number:

TP181

Download Full Text:

Click Here

Downloaded:

3277

Download summary:

Downloaded:

961

Clicked:

2213

Cited:

On-line Access:

2025-02-10

Received:

2023-10-10

Revision Accepted:

2023-10-17

Crosschecked:

2025-02-18

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952276; Fax: +86-571-87952331; E-mail: jzus@zju.edu.cn
Copyright © 2000~ Journal of Zhejiang University-SCIENCE

CONTENTS

INSTR. FOR AUTHOR

FOR REVIEWER

ABOUT JZUS

Publishing Service