Publishing Service

Polishing & Checking

Journal of Zhejiang University SCIENCE A

ISSN 1673-565X(Print), 1862-1775(Online), Monthly

New method for high performance multiply-accumulator design

Abstract: This study presents a new method of 4-pipelined high-performance split multiply-accumulator (MAC) architecture, which is capable of supporting multiple precisions developed for media processors. To speed up the design further, a novel partial product compression circuit based on interleaved adders and a modified hybrid partial product reduction tree (PPRT) scheme are proposed. The MAC can perform 1-way 32-bit, 4-way 16-bit signed/unsigned multiply or multiply-accumulate operations and 2-way parallel multiply add (PMADD) operations at a high frequency of 1.25 GHz under worst-case conditions and 1.67 GHz under typical-case conditions, respectively. Compared with the MAC in 32-bit microprocessor without interlocked piped stages (MIPS), the proposed design shows a great advantage in speed. Moreover, an improvement of up to 32% in throughput is achieved. The MAC design has been fabricated with Taiwan Semiconductor Manufacturing Company (TSMC) 90-nm CMOS standard cell technology and has passed a functional test.

Key words: Multiply-accumulator (MAC), Pipeline, Compressor, Partial product reduction tree (PPRT), Split structure


Share this article to: More

Go to Contents

References:

<HIDE>

[1] Abdelgawad, A., Bayoumi, M., 2007. High Speed and Area-efficient Multiply Accumulate (MAC) Unit for Digital Signal Processing Applications. Proc. IEEE Int. Symp. on Circuits and Systems. New Orleans, USA, p.3199-3202.

[2] Chang, C.H., Gu, J.M., Zhang, M.Y., 2004. Ultra low-voltage low-power CMOS 4-2 and 5-2 compressors for fast arithmetic circuits. IEEE Trans. Circuits Syst. I: Fundam. Theory Appl., 51(10):1985-1997.

[3] Chang, C.H., Gu, J.M., Zhang, M.Y., 2005. A review of 0.18-μm full adder performances for tree structured arithmetic circuits. IEEE Trans. Very Large Scale Integration (VLSI) Syst., 13(6):686-695.

[4] Chen, K.H., Chu, Y.S., 2007. A low-power multiplier with the spurious power suppression technique. IEEE Trans. Very Large Scale Integration (VLSI) Syst., 15(7):846-850.

[5] Chong, K.S., Gwee, B.H., Chang, J.S., 2007. Low energy 16-bit Booth leapfrog array multiplier using dynamic adders. IET Proc. Circ., Devices & Syst., 1(2):170-174.

[6] Clark, L., Hoffman, E.J., Miller, J., Biyani, M., Liao, L.Y., Strazdus, S., Morrow, M., Velarde, K.E., Yarch, M.A., 2001. An embedded 32-b microprocessor core for low-power and high-performance applications. IEEE J. Solid-State Circ., 36(11):1599-1608.

[7] Danysh, A., Tan, D., 2005. Architecture and implementation of a vector/SIMD multiply-accumulate unit. IEEE Trans. Comput., 54(3):284-293.

[8] Elguibaly, F., 2000. A fast parallel multiplier-accumulator using the modified Booth algorithm. IEEE Trans. Circuits Syst. II: Analog Digital Sign. Process., 47(9):902-908.

[9] Fang, C.J., Huang, C.H., Wang, J.S., Yeh, C.W., 2002. Fast and Compact Dynamic Ripple Carry Adder Design. Proc. IEEE Asia-Pacific Conf. on ASIC. Taipei, Taiwan, p.25-28.

[10] Kim, Y., Kim, L., 2001. 64-bit carry-select adder with reduced area. Electr. Lett., 37(10):614-615.

[11] Kwon, O., Nowka, K., Swartzlander, E.E., 2000. A 16-bit×16-bit MAC Design Using Fast 5:2 Compressors. Proc. IEEE Int. Conf. on Application-specific Systems, Architectures, and Processors. Boston, USA, p.235-243.

[12] Liao, Y., Roberts, D., 2002. A high-performance and low-power 32-bit multiply-accumulate unit with single-instruction-multiple-data (SIMD) feature. IEEE J. Solid-State Circ., 37(7):926-931.

[13] MIPS Technologies, Inc., 2006. MIPS32 34KTM Processor Core Family Software User’s Manual. p.29-52.

[14] MIPS Technologies, Inc., 2007. MIPS32 74KTM Processor Core Family Software User’s Manual. p.29-40.

[15] Oklobdzija, V., Villeger, D., 1995. Improving multiplier design by using improved column compression tree and optimized final adder in CMOS technology. IEEE Trans. Very Large Scale Integration (VLSI) Syst., 3(2):292-301.

[16] Pai, Y.T., Chen, Y.K., 2004. The Fastest Carry Lookahead Adder. IEEE Int. Workshop on Electronic Design, Test and Applications, Perth, Australia, p.434-436.

[17] Parandeh-Afshar, H., Ahmadvand, M., Safari, S., 2006. A Novel Merged Multiplier-accumulator Embedded in DSP Coprocessor. Proc. IEEE Int. Conf. on Electronics, Circuits and Systems, Nice, France, p.119-122.

[18] Perri, S., Corsonello, P., Cocorullo, G., 2005. Efficient recursive multiply architecture for FPGAs. Electr. Lett., 41(24):1314.

[19] Rabaey, J.M., 2002. Digital Integrated Circuits—A Design Perspective. Prentice-Hall International Publisher, New Jersey, USA, p.564-586.

[20] Sundeepkumar, A., Pavankumar, V., Yorkesh, R., 2008. Energy Efficient, High Performance Circuits for Arithmetic Units. Proc. Int. Conf. on VLSI Design, Bangalore, India, p.371-376.

[21] Tan, D., Danysh, A., Liebelt, M., 2003. Multiple-precision Fixed-point Vector Multiply-accumulator Using Shared Segmentation. Proc. IEEE Symp. on Computer Arithmetic, Santiago de. Compostela, Spain, p.12-19.

[22] Wallace, C.S., 1964. A suggestion for a fast multiplier. IEEE Trans. Electr. Comput., 13(1):14-17.

[23] Wang, L.R., Jou, S.J., Lee, C.L., 2008. A Well-structured Modified Booth Multiplier Design. IEEE Int. Symp. on VLSI Design, Automation and Test, Hsinchu, Taiwan, p.85-88.

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





DOI:

10.1631/jzus.A0820566

CLC number:

TP332

Download Full Text:

Click Here

Downloaded:

4355

Clicked:

6603

Cited:

2

On-line Access:

Received:

2008-07-27

Revision Accepted:

2008-10-28

Crosschecked:

2009-04-27

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952276; Fax: +86-571-87952331; E-mail: jzus@zju.edu.cn
Copyright © 2000~ Journal of Zhejiang University-SCIENCE