|
Frontiers of Information Technology & Electronic Engineering
ISSN 2095-9184 (print), ISSN 2095-9230 (online)
2025 Vol.26 No.1 P.109-118
Automatic parallelism strategy generation with minimal memory redundancy
Abstract: Large-scale deep learning models are trained distributedly due to memory and computing resource limitations. Few existing strategy generation approaches take optimal memory minimization as the objective. To fill in this gap, we propose a novel algorithm that generates optimal parallelism strategies with the constraint of minimal memory redundancy. We propose a novel redundant memory cost model to calculate the memory overhead of each operator in a given parallel strategy. To generate the optimal parallelism strategy, we formulate the parallelism strategy search problem into an integer linear programming problem and use an efficient solver to find minimal-memory intra-operator parallelism strategies. Furthermore, the proposed algorithm has been extended and implemented in a multi-dimensional parallel training framework and is characterized by high throughput and minimal memory redundancy. Experimental results demonstrate that our approach achieves memory savings of up to 67% compared to the latest Megatron-LM strategies; in contrast, the gap between the throughput of our approach and its counterparts is not large.
Key words: Deep learning; Automatic parallelism; Minimal memory redundancy
国防科技大学并行与分布处理国家重点实验室,中国长沙市,410000
摘要:受内存和计算资源限制,大规模深度学习模型通常以分布式方式训练。现有策略生成方法很少以最小化内存占用作为目标。为此,提出一种新算法,能够生成以最小化内存冗余为目标的自动并行策略。提出一种冗余内存代价模型来计算给定并行策略中每个算子的内存开销。为确保生成最优的并行策略,将并行策略搜索问题形式化为整数线性规划问题,使用高效求解器寻找具有最小内存占用的算子内并行策略。所提方法在多维并行训练框架中实现;实验结果表明,与最新Megatron-LM方法相比,可节省高达67%的内存开销,而吞吐量相差不大。
关键词组:
References:
Open peer comments: Debate/Discuss/Question/Opinion
<1>
DOI:
10.1631/FITEE.2300684
CLC number:
TP181
Download Full Text:
Downloaded:
621
Download summary:
<Click Here>Downloaded:
36Clicked:
858
Cited:
0
On-line Access:
2025-02-10
Received:
2023-10-10
Revision Accepted:
2023-10-17
Crosschecked:
2025-02-18