Publishing Service

Polishing & Checking

Frontiers of Information Technology & Electronic Engineering

ISSN 2095-9184 (print), ISSN 2095-9230 (online)

TEES: topology-aware execution environment service for fast and agile application deployment in HPC

Abstract: High-performance computing (HPC) systems are about to reach a new height: exascale. Application deployment is becoming an increasingly prominent problem. Container technology solves the problems of encapsulation and migration of applications and their execution environment. However, the container image is too large, and deploying the image to a large number of compute nodes is time-consuming. Although the peer-to-peer (P2P) approach brings higher transmission efficiency, it introduces larger network load. All of these issues lead to high startup latency of the application. To solve these problems, we propose the topology-aware execution environment service (TEES) for fast and agile application deployment on HPC systems. TEES creates a more lightweight execution environment for users, and uses a more efficient topology-aware P2P approach to reduce deployment time. Combined with a split-step transport and launch-in-advance mechanism, TEES reduces application startup latency. In the Tianhe HPC system, TEES realizes the deployment and startup of a typical application on 17 560 compute nodes within 3 s. Compared to container-based application deployment, the speed is increased by 12-fold, and the network load is reduced by 85%.

Key words: Execution environment; Application deployment; High-performance computing (HPC); Container; Peer-to-peer (P2P); Network topology

Chinese Summary  <26> TEES:一种面向高性能计算快速、灵活应用程序部署的拓扑感知的运行环境服务

邵明天,卢凯,迟万庆,王睿伯,戴屹钦,张文喆
国防科技大学计算机学院,中国长沙市,410073
摘要:高性能计算(HPC)即将达到新的高度:百亿亿次。应用程序部署正成为一个日益突出的问题。容器技术解决了应用程序及其运行环境的封装和迁移问题。但是,容器镜像太过笨重,在大量计算结点上的部署过程非常耗时。虽然点对点(P2P)方式带来更高的传输效率,但也引入更大的网络负载。所有这些问题都会导致应用程序的高启动延迟。为解决这些问题,提出拓扑感知的运行环境服务(TEES),用于在高性能计算系统上快速、灵活地部署应用程序。TEES为用户创建了一个更轻量级的运行环境,并使用一种更有效的拓扑感知P2P方法减少部署时间。结合分步传输和提前启动机制,TEES降低了应用程序的启动延迟。在天河高性能计算系统中,TEES在3秒内实现了在17 560个计算结点上的一个典型应用程序的部署和启动。与基于容器的应用程序部署方式相比,速度提高了12倍,网络负载减少了85%。

关键词组:运行环境;应用部署;高性能计算(HPC);容器;点对点(P2P);网络拓扑


Share this article to: More

Go to Contents

References:

<Show All>

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





DOI:

10.1631/FITEE.2100284

CLC number:

TP315

Download Full Text:

Click Here

Downloaded:

3888

Download summary:

<Click Here> 

Downloaded:

305

Clicked:

3306

Cited:

0

On-line Access:

2022-10-26

Received:

2021-06-16

Revision Accepted:

2022-10-26

Crosschecked:

2021-10-24

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952276; Fax: +86-571-87952331; E-mail: jzus@zju.edu.cn
Copyright © 2000~ Journal of Zhejiang University-SCIENCE