Publishing Service

Polishing & Checking

Frontiers of Information Technology & Electronic Engineering

ISSN 2095-9184 (print), ISSN 2095-9230 (online)

Networking and communication challenges for post-exascale systems

Abstract: With the significant advancement in emerging processor, memory, and networking technologies, exascale systems will become available in the next few years (2020–2022). As the exascale systems begin to be deployed and used, there will be a continuous demand to run next-generation applications with finer granularity, finer time-steps, and increased data sizes. Based on historical trends, next-generation applications will require post-exascale systems during 2025–2035. In this study, we focus on the networking and communication challenges for post-exascale systems. Firstly, we present an envisioned architecture for post-exascale systems. Secondly, the challenges are summarized from different perspectives: heterogeneous networking technologies, high-performance communication and synchronization protocols, integrated support with accelerators and field-programmable gate arrays, fault-tolerance and quality-of-service support, energy-aware communication schemes and protocols, software-defined networking, and scalable communication protocols with heterogeneous memory and storage. Thirdly, we present the challenges in designing efficient programming model support for high-performance computing, big data, and deep learning on these systems. Finally, we emphasize the critical need for co-designing runtime with upper layers on these systems to achieve the maximum performance and scalability.

Key words: Networking, Communication, Synchronization, Post-exascale, Programming model, Big data, High-performance computing (HPC), Deep learning, Quality of service (QoS), Accelerator

Chinese Summary  <18> 超百亿亿级系统面临的网络和通信挑战

摘要:由于新兴处理器、内存和网络技术的显著进步,百亿亿级系统将在未来几年(2020-2022)推出。随着百亿亿级系统被配置和使用,具有更细粒度、更短时间步长和更大数据量的下一代应用程序将被持续需求。从发展趋势看,2025-2035年间,下一代应用程序将需要超百亿亿级系统。本文关注超百亿亿级系统在网络和通信方面面临的挑战。首先,提出超百亿亿级系统的设想架构。其次,从不同方面阐述面临的挑战,包括多种网络技术、高性能通信和同步协议、加速器和现场可编程门户阵列的集成支持、容错和服务质量支持、能量感知通信方案和协议、软件定义网络以及多种内存和存储器的可扩展通信协议。再次,指出在这些系统上进行支持高性能计算、大数据和深度学习的高效编程模型设计面临的挑战。最后,强调了这些系统的上层共同设计运行时间的关键需求,以实现最优性能和可扩展性。

关键词组:网络;通信;同步;超百亿亿级;编程模型;大数据;高性能计算;深度学习;服务质量;加速器


Share this article to: More

Go to Contents

References:

<Show All>

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





DOI:

10.1631/FITEE.1800631

CLC number:

TP311

Download Full Text:

Click Here

Downloaded:

2282

Clicked:

3144

Cited:

0

On-line Access:

2022-04-22

Received:

2018-10-09

Revision Accepted:

2018-10-15

Crosschecked:

2018-10-15

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952276; Fax: +86-571-87952331; E-mail: jzus@zju.edu.cn
Copyright © 2000~ Journal of Zhejiang University-SCIENCE