CLC number: TP316
On-line Access: 2012-06-05
Received: 2011-07-04
Revision Accepted: 2012-03-27
Crosschecked: 2012-05-04
Cited: 0
Clicked: 7300
Eunsung Kim, Hyeonsang Eom, Heon Y. Yeom. Asymmetry-aware load balancing for parallel applications in single-ISA multi-core systems[J]. Journal of Zhejiang University Science C, 2012, 13(6): 413-427.
@article{title="Asymmetry-aware load balancing for parallel applications in single-ISA multi-core systems",
author="Eunsung Kim, Hyeonsang Eom, Heon Y. Yeom",
journal="Journal of Zhejiang University Science C",
volume="13",
number="6",
pages="413-427",
year="2012",
publisher="Zhejiang University Press & Springer",
doi="10.1631/jzus.C1100198"
}
%0 Journal Article
%T Asymmetry-aware load balancing for parallel applications in single-ISA multi-core systems
%A Eunsung Kim
%A Hyeonsang Eom
%A Heon Y. Yeom
%J Journal of Zhejiang University SCIENCE C
%V 13
%N 6
%P 413-427
%@ 1869-1951
%D 2012
%I Zhejiang University Press & Springer
%DOI 10.1631/jzus.C1100198
TY - JOUR
T1 - Asymmetry-aware load balancing for parallel applications in single-ISA multi-core systems
A1 - Eunsung Kim
A1 - Hyeonsang Eom
A1 - Heon Y. Yeom
J0 - Journal of Zhejiang University Science C
VL - 13
IS - 6
SP - 413
EP - 427
%@ 1869-1951
Y1 - 2012
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/jzus.C1100198
Abstract: Contemporary operating systems for single-ISA (instruction set architecture) multi-core systems attempt to distribute tasks equally among all the CPUs. This approach works relatively well when there is no difference in CPU capability. However, there are cases in which CPU capability differs from one another. For instance, static capability asymmetry results from the advent of new asymmetric hardware, and dynamic capability asymmetry comes from the operating system (OS) outside noise caused from networking or I/O handling. These asymmetries can make it hard for the OS scheduler to evenly distribute the tasks, resulting in less efficient load balancing. In this paper, we propose a user-level load balancer for parallel applications, called the ‘capability balancer’, which recognizes the difference of CPU capability and makes subtasks share the entire CPU capability fairly. The balancer can coexist with the existing kernel-level load balancer without detrimenting the behavior of the kernel balancer. The capability balancer can fairly distribute CPU capability to tasks with very little overhead. For real workloads like the NAS Parallel Benchmark (NPB), we have accomplished speedups of up to 9.8% and 8.5% in dynamic and static asymmetries, respectively. We have also experienced speedups of 13.3% for dynamic asymmetry and 24.1% for static asymmetry in a competitive environment. The impacts of our task selection policies, FIFO (first in, first out) and cache, were compared. The use of the cache policy led to a speedup of 5.3% in overall execution time and a decrease of 4.7% in the overall cache miss count, compared with the FIFO policy, which is used by default.
[1]Asanovic, K., Bodik, R., Demmel, J., Keaveny, T., Keutzer, K., Kubiatowicz, J., Morgan, N., Patterson, D., Sen, K., Wawrzynek, J., et al., 2009. A view of the parallel computing landscape. Commun. ACM, 52(10):56-67.
[2]Bailey, D.H., Barszcz, E., Barton, J.T., Browning, D.S., Carter, R.L., Dagum, L., Fatoohi, R.A., Frederickson, P.O., Lsinski, T.A., Schreiber, R.S., et al., 1991. The NAS Parallel Benchmarks. Int. J. Supercomput. Appl., 5(3):63-73.
[3]Beckman, P., Iskra, K., Yoshii, K., Coghlan, S., 2006. Operating system issues for petascale systems. ACM SIGOPS Oper. Syst. Rev., 40(2):29-33.
[4]Beckman, P., Iskra, K., Yoshii, K., Coghlan, S., Nataraj, A., 2008. Benchmarking the effects of operating system interference on extreme-scale parallel machines. Cluster Comput., 11(1):3-16.
[5]Boneti, C., Gioiosa, R., Cazorla, F.J., Valero, M., 2008. A Dynamic Scheduler for Balancing HPC Applications. Proc. ACM/IEEE Conf. on Supercomputing, Article No. 41, p.1-12.
[6]Bovet, D.P., Cesati, M., 2005. Understanding the Linux Kernel (3rd Ed.). O′Reilly, p.258-293.
[7]Brodowski, D., 2005. Current Trend in Linux Kernel Power Management, linuxtag 2005. Available from http://www.free-it.de/archiv/talks_2005/paper-11017/paper-11017.pdf [Accessed on Apr. 15, 2012].
[8]De, P., Kothari, R., Mann, V., 2007. Identifying Sources of Operating System Jitter Through Fine-Grained Kernel Instrumentation. Proc. IEEE Int. Conf. on Cluster Computing, p.331-340.
[9]Demers, A., Keshav, S., Shenker, S., 1989. Analysis and simulation of a fair queueing algorithm. ACM SIGCOMM Comput. Commun. Rev., 19(4):1-12.
[10]Ferreira, K., Brightwell, R., Bridges, P., 2008. Characterizing Application Sensitivity to OS Interference Using Kernel-Level Noise Injection. Proc. ACM/IEEE Conf. on Supercomputing, Article No. 19, p.1-12.
[11]Gioiosa, R., Petrini, F., Davis, K., Lebaillif-Delamare, F., 2004. Analysis of System Overhead on Parallel Computers. 4th IEEE Int. Symp. on Signal Processing and Information Technology, p.387-390.
[12]Gioiosa, R., McKee, S.A., Valero, M., 2010. Designing OS for HPC Applications: Scheduling. Proc. IEEE Int. Conf. on Cluster Computing, p.78-87.
[13]Hill, M.D., Marty, M.R., 2008. Amdahl’s law in the multicore era. IEEE Comput., 41(7):33-38.
[14]Hoefler, T., Mehlan, T., Lumsdaine, A., Rehm, W., 2007. Netgauge: a Network Performance Measurement Framework. Proc. High Performance Computing and Communications, p.659-671.
[15]Hofmeyr, S., Iancu, C., Blagojevíc, F., 2010. Load Balancing on Speed. Proc. 15th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming, p.147-158.
[16]Jiang, X., Mishra, A., Zhao, L., Iyer, R., Fang, Z., Srinivasan, S., Makineni, S., Brett, P., Das, C.R., 2011. ACCESS: Smart Scheduling for Asymmetric Cache CMPs. Proc. IEEE 17th Int. Symp. on High Performance Computer Architecture, p.527-538.
[17]Koufaty, D., Reddy, D., Hahn, S., 2010. Bias Scheduling in Heterogeneous Multi-core Architectures. Proc. 5th European Conf. on Computer Systems, p.125-138.
[18]Kumar, R., Farkas, K.I., Jouppi, N.P., Ranganathan, P., Tullsen, D.M., 2003. Single-ISA Heterogeneous Multi-core Architectures: the Potential for Processor Power Reduction. Proc. 36th Annual IEEE/ACM Int. Symp. on Micro-architecture, p.81-92.
[19]Kumar, R., Tullsen, D.M., Ranganathan, P., Jouppi, N.P., Farkas, K.I., 2004. Single-ISA Heterogeneous Multi-core Architectures for Multithreaded Workload Performance. Proc. 31st Annual Int. Symp. on Computer Architecture, p.64-75.
[20]Lakshminarayana, N., Rao, S., Kim, H., 2008. Asymmetry Aware Scheduling Algorithms for Asymmetric Multiprocessors. Proc. 4th Annual Workshop on the Interaction Between Operating Systems and Computer Architecture.
[21]Li, T., Baumberger, D., Hahn, S., 2009. Efficient and Scalable Multiprocessor Fair Scheduling Using Distributed Weighted Round-Robin. Proc. 14th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming, p.65-74.
[22]Olsson, R., 2005. pktgen the Linux Packet Generator. Proc. Linux Symp., p.11-24.
[23]Pallipadi, V., Starikovskiy, A., 2006. The Ondemand Governor. Proc. Linux Symp., p.223-238.
[24]Petrini, F., Kerbyson, D.J., Pakin, S., 2003. The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8192 Processors of ASCI Q. Proc. ACM/IEEE Conf. on Supercomputing, p.55-71.
[25]Radojkovic, P., Cakarevic, V., Verdu, J., Pajuelo, A., Gioiosa, R., Cazorla, F.J., Nemirovsky, M., Valero, M., 2008. Measuring Operating System Overhead on CMT Processors. Proc. 20th Int. Symp. on Computer Architecture and High Performance Computing, p.133-140.
[26]Saez, J.C., Prieto, M., Fedorova, A., Blagodurov, S., 2010. A Comprehensive Scheduler for Asymmetric Multicore Systems. Proc. 5th European Conf. on Computer Systems, p.139-152.
[27]Salim, J.H., 2001. Beyond Softnet. Proc. 5th Annual Linux Showcase and Conf., p.165-172.
[28]Scogland, T., Balaji, P., Feng, W., Narayanaswamy, G., 2008. Asymmetric Interactions in Symmetric Multi-core Systems: Analysis, Enhancements and Evaluation. Proc. ACM/IEEE Conf. on Supercomputing, Article No. 17, p.1-12.
[29]Tsafrir, D., Etsion, Y., Feitelson, D.G., Kirkpatrick, S., 2005. System Noise, OS Clock Ticks, and Fine-Grained Parallel Applications. Proc. 19th Annual Int. Conf. on Supercomputing, p.303-312.
Open peer comments: Debate/Discuss/Question/Opinion
<1>