CLC number: TP302
On-line Access: 2024-08-27
Received: 2023-10-17
Revision Accepted: 2024-05-08
Crosschecked: 2022-11-29
Cited: 0
Clicked: 2132
Citations: Bibtex RefMan EndNote GB/T7714
Xiaobin HE, Xin CHEN, Heng GUO, Xin LIU, Dexun CHEN, Yuling YANG, Jie GAO, Yunlong FENG, Longde CHEN, Xiaona DIAO, Zuoning CHEN. Scalability and efficiency challenges for the exascale supercomputing system: practice of a parallel supporting environment on the Sunway exascale prototype system[J]. Frontiers of Information Technology & Electronic Engineering, 2023, 24(1): 41-58.
@article{title="Scalability and efficiency challenges for the exascale supercomputing system: practice of a parallel supporting environment on the Sunway exascale prototype system",
author="Xiaobin HE, Xin CHEN, Heng GUO, Xin LIU, Dexun CHEN, Yuling YANG, Jie GAO, Yunlong FENG, Longde CHEN, Xiaona DIAO, Zuoning CHEN",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="24",
number="1",
pages="41-58",
year="2023",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2200412"
}
%0 Journal Article
%T Scalability and efficiency challenges for the exascale supercomputing system: practice of a parallel supporting environment on the Sunway exascale prototype system
%A Xiaobin HE
%A Xin CHEN
%A Heng GUO
%A Xin LIU
%A Dexun CHEN
%A Yuling YANG
%A Jie GAO
%A Yunlong FENG
%A Longde CHEN
%A Xiaona DIAO
%A Zuoning CHEN
%J Frontiers of Information Technology & Electronic Engineering
%V 24
%N 1
%P 41-58
%@ 2095-9184
%D 2023
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2200412
TY - JOUR
T1 - Scalability and efficiency challenges for the exascale supercomputing system: practice of a parallel supporting environment on the Sunway exascale prototype system
A1 - Xiaobin HE
A1 - Xin CHEN
A1 - Heng GUO
A1 - Xin LIU
A1 - Dexun CHEN
A1 - Yuling YANG
A1 - Jie GAO
A1 - Yunlong FENG
A1 - Longde CHEN
A1 - Xiaona DIAO
A1 - Zuoning CHEN
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 24
IS - 1
SP - 41
EP - 58
%@ 2095-9184
Y1 - 2023
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2200412
Abstract: With the continuous improvement of supercomputer performance and the integration of artificial intelligence with traditional scientific computing, the scale of applications is gradually increasing, from millions to tens of millions of computing cores, which raises great challenges to achieve high scalability and efficiency of parallel applications on super-large-scale systems. Taking the sunway exascale prototype system as an example, in this paper we first analyze the challenges of high scalability and high efficiency for parallel applications in the exascale era. To overcome these challenges, the optimization technologies used in the parallel supporting environment software on the sunway exascale prototype system are highlighted, including the parallel operating system, input/output (I/O) optimization technology, ultra-large-scale parallel debugging technology, 10-million-core parallel algorithm, and mixed-precision method. Parallel operating systems and I/O optimization technology mainly support large-scale system scaling, while the ultra-large-scale parallel debugging technology, 10-million-core parallel algorithm, and mixed-precision method mainly enhance the efficiency of large-scale applications. Finally, the contributions to various applications running on the sunway exascale prototype system are introduced, verifying the effectiveness of the parallel supporting environment design.
[1]Arute F, Arya K, Babbush R, et al., 2019. Quantum supremacy using a programmable superconducting processor. Nature, 574(7779):505-510.
[2]Berendsen HJC, van der Spoel D, van Drunen R, 1995. Gromacs: a message-passing parallel molecular dynamics implementation. Comput Phys Commun, 91(1-3):43-56.
[3]Buluc A, Gilbert JR, 2012. Parallel sparse matrix-matrix multiplication and indexing: implementation and experiments. SIAM J Sci Comput, 34(4):C170-C191.
[4]Chen Q, Chen K, Chen ZN, et al., 2020. Lessons learned from optimizing the Sunway storage system for higher application I/O performance. J Comput Sci Technol, 35(1):47-60.
[5]Derouillat J, Beck A, Pérez F, et al., 2018. SMILEI: a collaborative, open-source, multi-purpose particle-in-cell code for plasma simulation. Comput Phys Commun, 222:351-373.
[6]Fu HH, Liao JF, Yang JZ, et al., 2016. The Sunway TaihuLight supercomputer: system and applications. Sci China Inform Sci, 59(7):072001.
[7]Gu J, Feng JW, Hao XY, et al., 2021. Establishing a non-hydrostatic global atmospheric modeling system (iAMAS) at 3-km horizontal resolution with online integrated aerosol feedbacks on the Sunway supercomputer of China. https://arxiv.org/abs/2112.04668v1
[8]Guo C, Liu Y, Xiong M, et al., 2019. General-purpose quantum circuit simulator with projected entangled-pair states and the quantum supremacy frontier. Phys Rev Lett, 123(19):190501.
[9]Guo C, Zhao YW, Huang HL, 2021. Verifying random quantum circuits with arbitrary geometry using tensor network states algorithm. Phys Rev Lett, 126(7):070502.
[10]Hluchý L, Bobák M, Müller H, et al., 2020. Heterogeneous exascale computing. In: Kovács L, Haidegger T, Szakál A (Eds.), Recent Advances in Intelligent Engineering. Springer, Chamr, p.81-110.
[11]Hofer P, Mössenböck H, 2014. Efficient and accurate stack trace sampling in the Java hotspot virtual machine. Proc 5th ACM/SPEC Int Conf on Performance Engineering, p.277-280.
[12]Hua Y, Shi X, Jin H, et al., 2019. Software-defined QoS for I/O in exascale computing. CCF Trans High Perform Comput, 1(1):49-59.
[13]Huang C, Zhang F, Newman M, et al., 2020. Classical simulation of quantum supremacy circuits. https://arxiv.org/abs/2005.06787
[14]Ji X, Yang B, Zhang TY, et al., 2019. Automatic, application-aware I/O forwarding resource allocation. Proc 17th USENIX Conf on File and Storage Technologies, p.265-279.
[15]Jia WL, Wang H, Chen MH, et al., 2020. Pushing the limit of molecular dynamics with ab initio accuracy to 100 million atoms with machine learning. Proc Int Conf for High Performance Computing, Networking, Storage and Analysis, p.1-14.
[16]Kurth T, Treichler S, Romero J, et al., 2018. Exascale deep learning for climate analytics. Proc Int Conf for High Performance Computing, Networking, Storage and Analysis, p.649-660.
[17]Li F, Liu X, Liu Y, et al., 2021. SW_Qsim: a minimize-memory quantum simulator with high-performance on a new Sunway supercomputer. Proc Int Conf for High Performance Computing, Networking, Storage and Analysis, p.1-13.
[18]Li MF, Chen JS, Xiao Q, et al., 2022. Bridging the gap between deep learning and frustrated quantum spin system for extreme-scale simulations on new generation of Sunway supercomputer. IEEE Trans Parall Distrib Syst, 33(11):2846-2859.
[19]Lin F, Liu Y, Guo YY, et al., 2021. ELS: emulation system for debugging and tuning large-scale parallel programs on small clusters. J Supercomput, 77(2):1635-1666.
[20]Lindahl E, Hess B, van der Spoel D, 2001. GROMACS 3.0: a package for molecular simulation and trajectory analysis. J Mol Model, 7(8):306-317.
[21]Liu S, Gao J, Liu X, et al., 2021. Establishing high performance AI ecosystem on Sunway platform. CCF Trans High Perform Comput, 3(3):224-241.
[22]Liu Y, Liu X, Li F, et al., 2021. Closing the “quantum supremacy” gap: achieving real-time simulation of a random quantum circuit using a new Sunway supercomputer. Proc Int Conf for High Performance Computing, Networking, Storage and Analysis, Article 3.
[23]Ma YJ, Lv S, Liu YQ, 2012. Introduction and application of cluster file system Lustre. Sci Technol Inform, (5):139-140 (in Chinese).
[24]Madduri K, Ibrahim KZ, Williams S, et al., 2011. Gyrokinetic toroidal simulations on leading multi- and manycore HPC systems. Proc Int Conf for High Performance Computing, Networking, Storage and Analysis, p.1-12.
[25]Markov IL, Shi YY, 2008. Simulating quantum computation by contracting tensor networks. SIAM J Comput, 38(3):963-981.
[26]Merrill D, Garland M, 2017. Merge-based parallel sparse matrix-vector multiplication. Proc Int Conf for High Performance Computing, Networking, Storage and Analysis, p.678-689.
[27]Micikevicius P, Narang S, Alben J, et al., 2018. Mixed precision training. Proc 6th Int Conf on Learning Representations.
[28]Pan F, Zhang P, 2021. Simulating the Sycamore quantum supremacy circuits. https://arxiv.org/abs/2103.03074v1
[29]Peng D, Feng Y, Liu Y, et al., 2022. Jdebug: a fast, non-intrusive and scalable fault locating tool for ten-million-scale parallel applications. IEEE Trans Parall Distrib Syst, 33(12):3491-3504.
[30]Shang HH, Li F, Zhang YQ, et al., 2021a. Extreme-scale ab initio quantum Raman spectra simulations on the leadership HPC system in China. Proc Int Conf for High Performance Computing, Networking, Storage and Analysis, Article 6.
[31]Shang HH, Li F, Zhang YQ, et al., 2021b. Accelerating all-electron ab initio simulation of Raman spectra for biological systems. Proc Int Conf for High Performance Computing, Networking, Storage and Analysis, Article 41.
[32]Shang HH, Chen X, Gao XY, et al., 2021c. TensorKMC: kinetic Monte Carlo simulation of 50 trillion atoms driven by deep learning on a new generation of Sunway supercomputer. Proc Int Conf for High Performance Computing, Networking, Storage and Analysis, Article 73.
[33]Shi X, Li M, Liu W, et al., 2017. SSDUP: a traffic-aware SSD burst buffer for HPC systems. Proc Int Conf on Supercomputing, p.1-10.
[34]Shoeybi M, Patwary M, Puri R, et al., 2019. Megatron-LM: training multi-billion parameter language models using model parallelism. https://arxiv.org/abs/1909.08053
[35]Trott O, Olson AJ, 2009. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J Comput Chem, 31(2):455-461.
[36]Villalonga B, Boixo S, Nelson B, et al., 2019. A flexible high-performance simulator for verifying and benchmarking quantum circuits implemented on real hardware. NPJ Quant Inform, 5(1):86.
[37]Villalonga B, Lyakh D, Boixo S, et al., 2020. Establishing the quantum supremacy frontier with a 281 Pflop/s simulation. Quant Sci Technol, 5(3):034003.
[38]Xiao JY, Chen JS, Zheng JS, et al., 2021. Symplectic structure-preserving particle-in-cell whole-volume simulation of tokamak plasmas to 111.3 trillion particles and 25.7 billion grids. Proc Int Conf for High Performance Computing, Networking, Storage and Analysis, Article 2.
[39]Yang B, Ji X, Ma XS, et al., 2019. End-to-end I/O monitoring on a leading supercomputer. Proc 16th USENIX Conf on Networked Systems Design and Implementation, p.379-394.
[40]Yang B, Zou YL, Liu WG, et al., 2022. An end-to-end and adaptive I/O optimization tool for modern HPC storage systems. IEEE Int Parallel and Distributed Processing Symp, p.1294-1304.
[41]Ye YJ, Song ZY, Zhou SC, et al., 2022. swNEMO_v4.0: an ocean model based on NEMO4 for the new-generation Sunway supercomputer. Geosci Model Dev, 15(14):5739-5756.
Open peer comments: Debate/Discuss/Question/Opinion
<1>