Journal of Zhejiang University

Frontiers of Information Technology & Electronic Engineering 2018 Vol.19 No.10 P.1224-1229

Exploring high-performance processor architecture beyond the exascale

Author(s): Xiang-hui Xie, Xun Jia
Affiliation(s): 1. State Key Laboratory of Mathematical Engineering and Advanced Computing, Wuxi 214125, China
Corresponding email(s): xie.xianghui@meac-skl.cn, jia.xun@meac-skl.cn
Key Words: High-performance computing, Beyond the exascale, Processor architecture, Application-customized hardware, Distributed computational resources

Share this article to： More <<< Previous Article \|Next Article >>>

Xiang-hui Xie, Xun Jia. Exploring high-performance processor architecture beyond the exascale[J]. Frontiers of Information Technology & Electronic Engineering, 2018, 19(10): 1224-1229.

@article{title="Exploring high-performance processor architecture beyond the exascale",
author="Xiang-hui Xie, Xun Jia",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="19",
number="10",
pages="1224-1229",
year="2018",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.1800424"
}

%0 Journal Article
%T Exploring high-performance processor architecture beyond the exascale
%A Xiang-hui Xie
%A Xun Jia
%J Frontiers of Information Technology & Electronic Engineering
%V 19
%N 10
%P 1224-1229
%@ 2095-9184
%D 2018
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.1800424

TY - JOUR
T1 - Exploring high-performance processor architecture beyond the exascale
A1 - Xiang-hui Xie
A1 - Xun Jia
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 19
IS - 10
SP - 1224
EP - 1229
%@ 2095-9184
Y1 - 2018
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.1800424

Abstract
Chinese Summary
Academic Network
Reviewer Comment

Abstract: The ever-increasing need for high performance in scientific computation and engineering applications will push high-performance computing beyond the exascale. As an integral part of a supercomputing system, high-performance processors and their architecture designs are crucial in improving system performance. In this paper, three architecture design goals for high-performance processors beyond the exascale are introduced, including effective performance scaling, efficient resource utilization, and adaptation to diverse applications. Then a high-performance many-core processor architecture with scalar processing and application-specific acceleration (Massa) is proposed, which aims to achieve the above three goals by employing the techniques of distributed computational resources and application-customized hardware. Finally, some future research directions regarding the Massa architecture are discussed.

后E级时代高性能处理器架构的探索

摘要：科学计算与工程应用对高性能日益增长的需求将推动高性能计算进入后E级时代。高性能处理器作为超级计算系统核心部件，其架构设计对提高系统性能至关重要。首先介绍后E级时代高性能处理器架构设计的3个目标，即性能有效扩展、资源高效利用和适应多种应用。其次，提出标量运算众核主芯片连接应用加速从芯片的Massa处理器架构，通过计算资源分布和应用定制硬件的结合，满足后E级时代高性能处理器架构设计的目标。最后，讨论了Massa架构未来需要重点研究的若干问题。

关键词：高性能计算；后E级；处理器架构；应用定制硬件；计算资源分布

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Esmaeilzadeh H, Blem E, Amant RS, et al., 2011. Dark silicon and the end of multicore scaling. 38^th Annual Int Symp on Computer Architecture, p.365-376.

[2]Fang JR, Fu HH, Zhao WL, et al., 2017. swDNN: a library for accelerating deep learning applications on Sunway TaihuLight. 31^st Int Parallel and Distributed Processing Symp, p.615-624.

[3]Fu HH, Liao JF, Yang JZ, et al., 2016. The Sunway TaihuLight supercomputer: system and applications. Sci China Inform Sci, 59(7):1-15.

[4]Fu HH, He CH, Chen BW, et al., 2017. 18.9-Pflops nonlinear earthquake simulation on Sunway TaihuLight: enabling depiction of 18-Hz and 8-meter scenarios. 30^th Int Conf for High Performance Computing, Networking, Storage and Analysis, p.1-12.

[5]García-Flores V, Ayguade E, Pe na AJ, 2017. Efficient data sharing on heterogeneous systems. Proc 46^th Int Conf on Parallel Processing, p.121-130.

[6]Hemmert S, 2016. Green HPC: from nice to necessity. Comput Sci Eng, 12(6):8-10.

[7]Jia X, Wu GM, Xie XH, 2017. A high-performance accelerator for floating-point matrix multiplication. 15^th Int Symp on Parallel and Distributed Processing with Applicatons, p.396-402.

[8]Jouppi NP, Young C, Patil N, et al., 2017. In-datacenter performance analysis of a tensor processing unit. 44^th Annual Int Symp on Computer Architecture, p.1-12.

[9]Lin H, Tang XC, Yu BW, et al., 2017. Scalable graph on Sunway TaihuLight with ten million cores. 31st Int Parallel and Distributed Processing Symp, p.635-645.

[10]Ozdal MM, Yesil S, Kim T, et al., 2016. Energy efficient architecture for graph analytics accelerators. 43rd Int Symp on Computer Architecture, p.166-177.

[11]Pedram A, Gerstlauer A, van de Geijn RA, 2011. A high-performance, low-power linear algebra core. 22^nd Int Conf on Application-specific System, Architecture and Processors, p.35-42.

[12]Schulte MJ, Ignatowski M, Loh GH, et al., 2015. Achieving exascale capabilities through heterogeneous computing. IEEE Micro, 35(4):26-36.

[13]Shalf JM, Leland R, 2015. Computing beyond Moore's law. Computer, 48(12):14-23.

[14]Silbertstein M, 2017. OmniX: an accelerator-centric OS for omni-programmable systems. 16$^rm th$ Workshop on Hot Topics in Operating Systems, p.69-75.

[15]Williams RS, 2017. What's next? [The end of Moore's law] Comput Sci Eng, 19(2):7-13.

[16]Xu ZG, Lin J, Matsuoka S, 2017. Benchmarking SW26010 many-core processor. 31^st Int Conf on Parallel and Distributed Processing Symp Workshops, p.743-752.

[17]Yang C, Xue W, Fu HH, et al., 2016. 10m-core scalable fully-implicit solver for nonhydrostatic atmospheric dynamics. 29^th Int Conf for High Performance Computing, Networking, Storage and Analysis, p.57-68.

[18]Zhao B, Gao W, Zhao RC, et al., 2015. Performance evaluation of NPB and SPEC CPU2006 on various SIMD extensions. 1^st Int Conf on Big Data Computing and Communications, p.257-272.

[19]Zheng F, Zhang K, Wu GM, et al., 2014. Architecture techniques of many-core processor for energy-efficient in high performance computing. Chin J Comput, 37(10):2176-2186 (in Chinese).

[20]Zheng F, Li HL, Lv H, et al., 2015. Cooperative computing techniques for a deeply fused and heterogeneous many-core processor architecture. J Comput Sci Technol, 30(1):145-162.

Open peer comments: Debate/Discuss/Question/Opinion

<1>