JZUS - Journal of Zhejiang University SCIENCE

Frontiers of Information Technology & Electronic Engineering 2015 Vol.16 No.12 P.1018-1033

Schedule refinement for homogeneous multi-core processors in the presence of manufacturing-caused heterogeneity

Author(s): Zhi-xiang Chen, Zhao-lin Li, Shan Cao, Fang Wang, Jie Zhou
Affiliation(s): 1. Department of Automation, Tsinghua University, Beijing 100084, China more
Corresponding email(s): chen-zx10@mails.tsinghua.edu.cn
Key Words: Schedule refining, Multi-core processor, Heterogeneity, Representative chip operating point

Share this article to： More <<< Previous Article \|Next Article >>>

Zhi-xiang Chen, Zhao-lin Li, Shan Cao, Fang Wang, Jie Zhou. Schedule refinement for homogeneous multi-core processors in the presence of manufacturing-caused heterogeneity[J]. Frontiers of Information Technology & Electronic Engineering, 2015, 16(12): 1018-1033.

@article{title="Schedule refinement for homogeneous multi-core processors in the presence of manufacturing-caused heterogeneity",
author="Zhi-xiang Chen, Zhao-lin Li, Shan Cao, Fang Wang, Jie Zhou",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="16",
number="12",
pages="1018-1033",
year="2015",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.1500035"
}

%0 Journal Article
%T Schedule refinement for homogeneous multi-core processors in the presence of manufacturing-caused heterogeneity
%A Zhi-xiang Chen
%A Zhao-lin Li
%A Shan Cao
%A Fang Wang
%A Jie Zhou
%J Frontiers of Information Technology & Electronic Engineering
%V 16
%N 12
%P 1018-1033
%@ 2095-9184
%D 2015
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.1500035

TY - JOUR
T1 - Schedule refinement for homogeneous multi-core processors in the presence of manufacturing-caused heterogeneity
A1 - Zhi-xiang Chen
A1 - Zhao-lin Li
A1 - Shan Cao
A1 - Fang Wang
A1 - Jie Zhou
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 16
IS - 12
SP - 1018
EP - 1033
%@ 2095-9184
Y1 - 2015
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.1500035

Abstract
Chinese Summary
Academic Network
Reviewer Comment

Abstract: Multi-core homogeneous processors have been widely used to deal with computation-intensive embedded applications. However, with the continuous down scaling of CMOS technology, within-die variations in the manufacturing process lead to a significant spread in the operating speeds of cores within homogeneous multi-core processors. Task scheduling approaches, which do not consider such heterogeneity caused by within-die variations, can lead to an overly pessimistic result in terms of performance. To realize an optimal performance according to the actual maximum clock frequencies at which cores can run, we present a heterogeneity-aware schedule refining (HASR) scheme by fully exploiting the heterogeneities of homogeneous multi-core processors in embedded domains. We analyze and show how the actual maximum frequencies of cores are used to guide the scheduling. In the scheme, representative chip operating points are selected and the corresponding optimal schedules are generated as candidate schedules. During the booting of each chip, according to the actual maximum clock frequencies of cores, one of the candidate schedules is bound to the chip to maximize the performance. A set of applications are designed to evaluate the proposed scheme. Experimental results show that the proposed scheme can improve the performance by an average value of 22.2%, compared with the baseline schedule based on the worst case timing analysis. Compared with the conventional task scheduling approach based on the actual maximum clock frequencies, the proposed scheme also improves the performance by up to 12%.

This paper is concerned with task scheduling techniques for optimal throughput on homogeneous multi-core processors taking into account intra-/inter-die frequency difference caused by silicon process variation. The paper proposes an HATS scheme, which adapts the existing DAG-based scheduling techniques to actual maximum frequencies of cores. Some representive chip operating points are chosen first from all possible conditions to reduce memory usage, and then these points are stored into on-chip memory. During chip running, one appropriate point is further chosen and bound to cores according to actual maximum clock frequencies. The paper shows that the HATS scheme can improve the throughput of application benchmarks compared with other scheduling techniques. The study is well motivated and the authors clearly describe the scheduling challenge of different core clock frequencies, due to intra-/inter-die silicon process variation. Both candidate’s selection and its binding to chip are well presented (in particular, Algorithms 1,2,3 are very helpful for the reader). The paper also defines the problem in a formulation. That is useful and I enjoyed reading that.

同构多核处理器中考虑制造差异的调度优化

目的：面向具有多个同构核心的处理器平台，考虑纳米级工艺下制造导致的差异性，实现性能最佳的调度优化。
创新点：提出一种离线生成多个候选调度结合在线调度绑定的方案，从而充分开采了制造差异性下的核心最大可工作频率的变化，取得了整体上的高性能。
方法：首先，考虑制造差异导致的性能变化，提出一种离线结合在线的调度优化方案。在离线阶段，考虑制造差异的分布情况，以期望性能为指标，选择代表性的芯片工作点并得到其对应的最佳调度，用于生成候选调度并存储在芯片上。其中，通过芯片工作点采样来解决芯片工作点数量的指数增长问题，并且将期望性能的最优化求解在一定的约束下转化为芯片工作点之间的关系，从而降低整体方案的复杂度。在在线阶段，芯片启动时，根据当前芯片的工作点与候选调度对应的芯片工作点之间的关系确定性能最优的调度。
结论：针对纳米工艺下呈现制造差异的多核处理器平台，提出了一种自适应的调度优化策略，实现了性能上的提升。

关键词：调度优化；多核处理器；差异性；代表芯片工作点

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Aguilera, P., Lee, J., Farmahini-Farahani, A., et al., 2014. Process variation-aware workload partitioning algorithms for GPUs supporting spatial-multitasking. Design, Automation and Test in Europe Conf. and Exhibition, p.176.1-176.6.

[2]Bell, S., Edwards, B., Amann, J., et al., 2008. TILE64 processor: a 64-core SoC with mesh interconnect. IEEE Int. Solid-State Circuits Conf., p.588-598.

[3]Bowman, K.A., Duvall, S.G., Meindl, J.D., 2002. Impact of die-to-die and within-die parameter fluctuations on the maximum clock frequency distribution for gigascale integration. IEEE J. Solid-State Circ., 37(2):183-190.

[4]Bowman, K.A., Alameldeen, A.R., Srinivasan, S.T., et al., 2009. Impact of die-to-die and within-die parameter variations on the clock frequency and throughput of multi-core processors. IEEE Trans. VLSI Syst., 17(12):1679-1690.

[5]Chon, H., Kim, T., 2009. Timing variation-aware task scheduling and binding for MPSoC. Proc. Asia and South Pacific Design Automation Conf., p.137-142.

[6]Dick, R.P., Rhodes, D.L., Wolf, W., 1998. TGFF: task graphs for free. Proc. 6th Int. Workshop on Hardware/Software Codesign, p.97-101.

[7]Dietrich, M., Haase, J., 2012. Process Variations and Probabilistic Integrated Circuit Design. Springer, New York, p.69-89.

[8]Ferrandi, F., Lanzi, P.L., Pilato, C., et al., 2010. Ant colony heuristic for mapping and scheduling tasks and communications on heterogeneous embedded systems. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst., 29(6):911-924.

[9]Huang, L., Xu, Q., 2010. Performance yield-driven task allocation and scheduling for MPSoCs under process variation. Proc. 47th Design Automation Conf., p.326-331.

[10]Huang, W., Rajamani, K., Stan, M.R., et al., 2011. Scaling with design constraints: predicting the future of big chips. IEEE Micro, 31(4):16-29.

[11]ITRS, 2013. International Technology Roadmap for Semiconductors. Available from http://www.itrs.net/reports.html [Accessed on Feb. 1, 2015]

[12]Khailany, B., Dally, W.J., Kapasi, U.J., et al., 2001. Imagine: media processing with streams. IEEE Micro, 21(2):35-46.

[13]Khodabandeloo, B., Khonsari, A., Gholamian, F., et al., 2014. Scenario-based quasi-static task mapping and scheduling for temperature-efficient MPSoC design under process variation. Microprocess. Microsyst., 38(5):399-414.

[14]Lin, Y.C., Lu, F., Cheng, K.T., 2005. Pseudo-functional scan-based BIST for delay fault. Proc. 23rd IEEE VLSI Test Symp., p.229-234.

[15]Mirzoyan, D., Akesson, B., Goossens, K., 2012. Process-variation aware mapping of real-time streaming applications to MPSoCs for improved yield. Proc. 13th Int. Symp. on Quality Electronic Design, p.41-48.

[16]Mirzoyan, D., Akesson, B., Goossens, K., 2014. Process-variation-aware mapping of best-effort and real-time streaming applications to MPSoCs. ACM Trans. Embed. Comput. Syst., 13(2s):61.1-61.24.

[17]Momtazpour, M., Goudarzi, M., Sanaei, E., 2010a. Variation-aware task and communication scheduling in MPSoCs for power-yield maximization. IEICE Trans. Fundament. Electron. Commun. Comput. Sci., 93(12):2542-2550.

[18]Momtazpour, M., Sanaei, E., Goudarzi, M., 2010b. Power-yield optimization in MPSoC task scheduling under process variation. Proc. 11th Int. Symp. on Quality Electronic Design, p.747-754.

[19]Momtazpour, M., Ghorbani, M., Goudarzi, M., et al., 2011. Simultaneous variation-aware architecture exploration and task scheduling for MPSoC energy minimization. Proc. 21st Symp. on GLSVLSI, p.271-276.

[20]Momtazpour, M., Goudarzi, M., Sanaei, E., 2013. Static statistical MPSoC power optimization by variation-aware task and communication scheduling. Microprocess. Microsyst., 37(8B):953-963.

[21]Omara, F.A., Arafa, M.M., 2010. Genetic algorithms for task scheduling problem. J. Parall. Distrib. Comput., 70(1):13-22.

[22]Ramamritham, K., 1995. Allocation and scheduling of precedence-related periodic tasks. IEEE Trans. Parall. Distrib. Syst., 6(4):412-420.

[23]Raychowdhury, A., Ghosh, S., Roy, K., 2005. A novel on-chip delay measurement hardware for efficient speed-binning. Proc. 11th IEEE Int. On-Line Testing Symp., p.287-292.

[24]Sarangi, S.R., Greskamp, B., Teodorescu, R., et al., 2008. VARIUS: a model of process variation and resulting timing errors for microarchitects. IEEE Trans. Semicond. Manufact., 21(1):3-13.

[25]Singhal, L., Bozorgzadeh, E., 2008. Process variation aware system-level task allocation using stochastic ordering of delay distributions. Proc. IEEE/ACM Int. Conf. on Computer-Aided Design, p.570-574.

[26]Stuijk, S., Geilen, M., Basten, T., 2006. SDF³: SDF for free. Proc. 6th Int. Conf. on Application of Concurrency to System Design, p.276-278.

[27]Taylor, M.B., Kim, J., Miller, J., et al., 2002. The raw microprocessor: a computational fabric for software circuits and general-purpose programs. IEEE Micro, 22(2):25-35.

[28]Topcuoglu, H., Hariri, S., Wu, M.Y., 2002. Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans. Parall. Distrib. Syst., 13(3):260-274.

[29]Von Mises, R., 1964. Mathematical Theory of Probability and Statistics. Academic Press, New York, p.329-367.

[30]Wang, F., Chen, Y., Nicopoulos, C., et al., 2011. Variation-aware task and communication mapping for MPSoC architecture. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst., 30(2):295-307.

[31]Yi, Y., Han, W., Zhao, X., et al., 2009. An ILP formulation for task mapping and scheduling on multi-core architectures. Design, Automation and Test in Europe Conf. and Exhibition, p.33-38.

[32]Yu, Z., Baas, B.M., 2009. High performance, energy efficiency, and scalability with GALS chip multiprocessors. IEEE Trans. VLSI Syst., 17(1):66-79.

[33]Zhao, W., Liu, F., Agarwal, K., et al., 2009. Rigorous extraction of process variations for 65-nm CMOS design. IEEE Trans. Semicond. Manufact., 22(1):196-203.

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Similar articles

- Go to

同构多核处理器中考虑制造差异的调度优化

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference