CLC number: TP302
On-line Access:
Received: 2007-11-26
Revision Accepted: 2008-05-16
Crosschecked: 0000-00-00
Cited: 0
Clicked: 5173
Hong-zhou CHEN, Xue-zeng PAN, Ling-di PING, Kui-jun LU, Xiao-ping CHEN. A spatially triggered dissipative resource distribution policy for SMT processors[J]. Journal of Zhejiang University Science A, 2008, 9(8): 1070-1082.
@article{title="A spatially triggered dissipative resource distribution policy for SMT processors",
author="Hong-zhou CHEN, Xue-zeng PAN, Ling-di PING, Kui-jun LU, Xiao-ping CHEN",
journal="Journal of Zhejiang University Science A",
volume="9",
number="8",
pages="1070-1082",
year="2008",
publisher="Zhejiang University Press & Springer",
doi="10.1631/jzus.A0720083"
}
%0 Journal Article
%T A spatially triggered dissipative resource distribution policy for SMT processors
%A Hong-zhou CHEN
%A Xue-zeng PAN
%A Ling-di PING
%A Kui-jun LU
%A Xiao-ping CHEN
%J Journal of Zhejiang University SCIENCE A
%V 9
%N 8
%P 1070-1082
%@ 1673-565X
%D 2008
%I Zhejiang University Press & Springer
%DOI 10.1631/jzus.A0720083
TY - JOUR
T1 - A spatially triggered dissipative resource distribution policy for SMT processors
A1 - Hong-zhou CHEN
A1 - Xue-zeng PAN
A1 - Ling-di PING
A1 - Kui-jun LU
A1 - Xiao-ping CHEN
J0 - Journal of Zhejiang University Science A
VL - 9
IS - 8
SP - 1070
EP - 1082
%@ 1673-565X
Y1 - 2008
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/jzus.A0720083
Abstract: Programs take on changing behavior at runtime in a simultaneous multithreading (SMT) environment. How reasonably common resources are distributed among the threads significantly determines the throughput and fairness performance in SMT processors. Existing resource distribution methods either mainly rely on the front-end fetch policy, or make distribution decisions according to the limited information from the pipeline. It is difficult for them to efficiently catch the various resource requirements of the threads. This work presents a spatially triggered dissipative resource distribution (SDRD) policy for SMT processors. Its two parts, the self-organization mechanism that is driven by the real-time instructions per cycle (IPC) performance and the introduction of chaos that tries to control the diversity of trial resource distributions, work together to supply sustaining resource distribution optimization for changing program behavior. Simulation results show that SDRD with fine-grained diversity controlling is more effective than that with a coarse-grained one. And SDRD benefits much from its two well-coordinated parts, providing potential fairness gains as well as good throughput gains. Meanings and settings of important SDRD parameters are also discussed.
[1] Burger, D., Austin, T.M., Bennett, S., 1996. Evaluating Future Microprocessors: The Simplescalar Tool Set. Technical Report 1308. Computer Science Department, University of Wisconsin-Madison, Madison. Http://www.cs.wisc.edu/techreports/viewreport.php?report=1308
[2] Cazorla, F., Ramirez, A., Valero, M., Fernández, E., 2004a. Dcache Warn: An I-fetch Policy to Increase SMT Efficiency. Proc. 18th Int. Parallel and Distributed Processing Symp., Santa Fe, NM, p.74-83.
[3] Cazorla, F., Ramirez, A., Valero, M., Fernández, E., 2004b. Dynamically Controlled Resource Allocation in SMT Processors. Proc. 37th Int. Symp. on Microarchitecture, Portland, OR, p.171-182.
[4] Choi, S., Yeung, D., 2006. Learning-based SMT Processor Resource Distribution via Hill-climbing. Proc. 33rd Annual Int. Symp. on Computer Architecture, Boston, MA, p.239-251.
[5] El-Moursy, A., Albonesi, D.H., 2003. Front-end Policies for Improved Issue Efficiency in SMT Processors. Proc. 9th Int. Conf. on High Performance Computer Architecture, Anaheim, CA, p.31-42.
[6] Henning, J., 2000. SPEC CPU2000: measuring CPU performance in the new millennium. IEEE Computer, 33(7):28-35.
[7] Hirata, H., Kimura, K., Nagamine, S., Mochizuki, Y., Nishimura, A., Nakase, Y., Nishizawa, T., 1992. An Elementary Processor Architecture with Simultaneous Instruction Issuing from Multiple Threads. Proc. 19th Annual Int. Symp. on Computer Architecture, Gold Coast, Australia, p.136-145.
[8] Latorre, F., González, J., González, A., 2004. Back-end Assignment Schemes for Clustered Multithreaded Processors. Proc. 18th Annual Int. Conf. on Supercomputing, Malo, France, p.316-325.
[9] Luo, K., Gummaraju, J., Franklin, M., 2001. Balancing throughput and fairness in SMT processors. Proc. Int. Symp. on Performance Analysis of Systems and Software, Tucson, AZ, p.164-171.
[10] Marr, D.T., Binns, F., Hill, D.L., Hinton, G., Koufaty, D.A., Miller, J.A., Upton, M., 2002. Hyper-threading technology architecture and microarchitecture. Intel Technol. J., 6(1):4-15.
[11] Nicolis, G., Prigogine, I., 1977. Self-organization in Nonequilibrium Systems: From Dissipative Structures to Order through Fluctuations. John Wiley, New York. p.55-60, 429-474.
[12] Perelman, E., Hamerly, G., Calder, B., 2003. Picking Statistically Valid and Early Simulation Points. Proc. 12th Int. Conf. on Parallel Architectures and Compilation Techniques, New Orleans, LA, p.244-255.
[13] Prigogine, I., 1967. Introduction to Thermodynamics of Irreversible Processes (3rd Ed.). Interscience Publisher, New York, p.124-134.
[14] Prigogine, I., 1976. Order through Fluctuation: Self-organization and Social System. In: Jantsch, E., Waddington, C. (Eds.), Evolution and Consciousness: Human Systems in Transition. Addison-Wesley, London, p.93-134.
[15] Raasch, S.E., Reinhardt, S.K., 2003. The Impact of Resource Partitioning on SMT Processors. Proc. 12th Int. Conf. on Parallel Architectures and Compilation Techniques, New Orleans, LA, p.15-25.
[16] Sharkey, J., Ponomarev, D., Ghose, K., 2005. M-Sim: A Flexible, Multithreaded Architectural Simulation Environment. Technical Report CS-TR-05-DP01. Department of Computer Science, State University of New York at Binghamton. Http://www.cs.binghamton.edu/~jsharke/msim
[17] Sharkey, J., Balkan, D., Ponomarev, D., 2006. Adaptive Reorder Buffers for SMT Processors. Proc. 15th Int. Conf. on Parallel Architectures and Compilation Techniques, Seattle, WA, p.244-253.
[18] Sherwood, T., Perelman, E., Hamerly, G., Calder, B., 2002. Automatically Characterizing Large Scale Program Behavior. Proc. 10th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, San Jose, CA, p.45-57.
[19] Tullsen, D.M., Brown, J.A., 2001. Handling Long-latency Loads in a Simultaneous Multithreading Processor. Proc. 34th Annual Int. Symp. on Microarchitecture, Austin, TX, p.318-327.
[20] Tullsen, D.M., Eggers, S.J., Levy, H.M., 1995. Simultaneous Multithreading: Maximizing On-chip Parallelism. Proc. 22nd Annual Int. Symp. on Computer Architecture, Santa Margherita Ligure, Italy, p.392-403.
[21] Tullsen, D.M., Eggers, S.J., Emer, J.S., Levy, H.M., 1996. Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor. Proc. 23th Annual Int. Symp. on Computer Architecture, Philadelphia, PA, p.191-202.
Open peer comments: Debate/Discuss/Question/Opinion
<1>