Publishing Service

Polishing & Checking

Frontiers of Information Technology & Electronic Engineering

ISSN 2095-9184 (print), ISSN 2095-9230 (online)

Resource scheduling techniques in cloud from a view of coordination: a holistic survey

Abstract: Nowadays, the management of resource contention in shared cloud remains a pending problem. The evolution and deployment of new application paradigms (e.g., deep learning training and microservices) and custom hardware (e.g., graphics processing unit (GPU) and tensor processing unit (TPU)) have posed new challenges in resource management system design. Current solutions tend to trade cluster efficiency for guaranteed application performance, e.g., resource over-allocation, leaving a lot of resources underutilized. Overcoming this dilemma is not easy, because different components across the software stack are involved. Nevertheless, massive efforts have been devoted to seeking effective performance isolation and highly efficient resource scheduling. The goal of this paper is to systematically cover related aspects to deliver the techniques from the coordination perspective, and to identify the corresponding trends they indicate. Briefly, four topics are involved. First, isolation mechanisms deployed at different levels (micro-architecture, system, and virtualization levels) are reviewed, including GPU multitasking methods. Second, resource scheduling techniques within an individual machine and at the cluster level are investigated, respectively. Particularly, GPU scheduling for deep learning applications is described in detail. Third, adaptive resource management including the latest microservice-related research is thoroughly explored. Finally, future research directions are discussed in the light of advanced work. We hope that this review paper will help researchers establish a global view of the landscape of resource management techniques in shared cloud, and see technology trends more clearly.

Key words: Coordination; Co-location; Heterogeneous computing; Microservice; Resource scheduling techniques

Chinese Summary  <38> 从协同视角论云资源调度技术:综述

王玉钊1,于俊清1,喻之斌2
1华中科技大学计算机科学与技术学院,中国武汉市,430074
2中国科学院深圳先进技术研究院异构智能计算体系结构与系统研究中心,中国深圳市,518055
摘要:当前公有云中的资源竞争管控仍然是一个悬而未决的问题。新型应用框架(如深度学习和微服务)和专用硬件(如GPU和TPU)的开发与部署给资源管理系统的设计带来新的挑战。现有的解决方案往往为保证应用性能而牺牲集群效率,如资源超额分配导致的低利用率。由于涉及到了软件栈中的不同模块,突破该困境并非易事。尽管如此,产学界为寻找高效的性能隔离和资源调度进行了大量的研究。本文从协同的角度对相关工作进行了全面概述,并揭示其中的技术发展趋势。简言之,本文涉及如下四个主题:不同层次上(包括微体系结构、系统和虚拟层)的资源隔离机制,包括GPU多任务处理;机器层和集群层的资源调度技术,包括面向深度学习应用的GPU调度技术;自适应资源管理技术,包括微服务相关的最新研究;最后探讨了未来的研究方向。希望本文能帮助相关研究人员了解公有云中资源管理技术的概貌,并更好地把握其发展趋势。

关键词组:协同;同宿;异构计算;微服务;资源调度技术


Share this article to: More

Go to Contents

References:

<Show All>

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





DOI:

10.1631/FITEE.2100298

CLC number:

TP39

Download Full Text:

Click Here

Downloaded:

4212

Download summary:

<Click Here> 

Downloaded:

301

Clicked:

2801

Cited:

0

On-line Access:

2023-01-21

Received:

2021-06-24

Revision Accepted:

2023-01-21

Crosschecked:

2021-11-02

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952276; Fax: +86-571-87952331; E-mail: jzus@zju.edu.cn
Copyright © 2000~ Journal of Zhejiang University-SCIENCE