CLC number: TN402
On-line Access: 2024-08-27
Received: 2023-10-17
Revision Accepted: 2024-05-08
Crosschecked: 2012-08-03
Cited: 0
Clicked: 8191
Choon Lih Hoo, Sallehuddin Mohamed Haris, Nik Abdullah Nik Mohamed. A floating point conversion algorithm for mixed precision computations[J]. Journal of Zhejiang University Science C, 2012, 13(9): 711-718.
@article{title="A floating point conversion algorithm for mixed precision computations",
author="Choon Lih Hoo, Sallehuddin Mohamed Haris, Nik Abdullah Nik Mohamed",
journal="Journal of Zhejiang University Science C",
volume="13",
number="9",
pages="711-718",
year="2012",
publisher="Zhejiang University Press & Springer",
doi="10.1631/jzus.C1200043"
}
%0 Journal Article
%T A floating point conversion algorithm for mixed precision computations
%A Choon Lih Hoo
%A Sallehuddin Mohamed Haris
%A Nik Abdullah Nik Mohamed
%J Journal of Zhejiang University SCIENCE C
%V 13
%N 9
%P 711-718
%@ 1869-1951
%D 2012
%I Zhejiang University Press & Springer
%DOI 10.1631/jzus.C1200043
TY - JOUR
T1 - A floating point conversion algorithm for mixed precision computations
A1 - Choon Lih Hoo
A1 - Sallehuddin Mohamed Haris
A1 - Nik Abdullah Nik Mohamed
J0 - Journal of Zhejiang University Science C
VL - 13
IS - 9
SP - 711
EP - 718
%@ 1869-1951
Y1 - 2012
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/jzus.C1200043
Abstract: The floating point number is the most commonly used real number representation for digital computations due to its high precision characteristics. It is used on computers and on single chip applications such as DSP chips. double precision (64-bit) representations allow for a wider range of real numbers to be denoted. However, single precision (32-bit) operations are more efficient. Recently, there has been an increasing interest in mixed precision computations which take advantage of single precision efficiency on 64-bit numbers. This calls for the ability to interchange between the two formats. In this paper, an algorithm that converts floating point numbers from 64- to 32-bit representations is presented. The algorithm was implemented as a verilog code and tested on field programmable gate array (FPGA) using the Quartus II DE2 board and Agilent 16821A portable logic analyzer. Results indicate that the algorithm can perform the conversion reliably and accurately within a constant execution time of 25 ns with a 20 MHz clock frequency regardless of the number being converted.
[1]Anzt, H., Heuveline, V., Rocker, B., 2011a. An Error Correction Solver for Linear Systems: Evaluation of Mixed Precision Implementations. High Performance Computing for Computational Science-VECPAR 2010, p.58-70.
[2]Anzt, H., Heuveline, V., Rocker, B., Castillo, M., Fernandez, J.C., Mayo, R., Quintana-Orti, E.S., 2011b. Power Consumption of Mixed Precision in the Iterative Solution of Sparse Linear Systems. IEEE Int. Symp. on Parallel & Distributed Processing Workshops and PhD Forum, p.829-836.
[3]Baboulin, M., Buttaro, A., Dongarra, J., Kurzak, J., Langou, J., Langou, J., Luszczek, P., Tomov, S., 2009. Accelerating scientific computations with mixed precision algorithms. Comput. Phys. Commun., 180(12):2526-2533.
[4]Boldo, S., Daumas, M., 2001. A Mechanically-Validated Technique for Extending the Available Precision. Conf. Record of the 35th Asilomar Conf. on Signals, Systems and Computer, p.1299-1303.
[5]Christopher, V., 2012. IEEE-754: Floating-Point Conversion. Available from http://babbage.cs.qc.edu/IEEE-754/64bit.html [Accessed on Feb. 20, 2012].
[6]Goddeke, D., Strzodka, R., Turek, S., 2005. Accelerating Double Precision FEM Simulations with GPUs. Proc. ASIM 18th Symp. on Simulation Technique, p.139-144.
[7]Goddeke, D., Strzodka, R., Turek, S., 2007. Performance and accuracy of hardware-oriented native-, emulated- and mixed-precision solvers in FEM simulations. Int. J. Parall. Emerg. Distr. Syst., 22(4):221-256.
[8]Groza, V., Dzerdz, B., 2004. Differential predictive floating-point analog-to-digital converter. Measurement, 35(2):139-151.
[9]Hasanien, H.M., 2011. FPGA implementation of adaptive ANN controller for speed regulation of permanent magnet stepper motor drives. Energy Conv. Manag., 52(2):1252-1257.
[10]Hollasch, S., 2005. IEEE Standard 754: Floating Point Numbers. Available from http://steve.hollasch.net/cgindex/coding/ieeefloat.html [Accessed on Feb. 20, 2012].
[11]IEEE, 1985. IEEE Standard for Binary Floating-Point Arithmetic. IEEE Std 754-1985.
[12]Langou, J., Luszczek, P., Kurzak, J., Buttari, A., Dongarra, J., 2006. Exploiting the Performance of 32 Bit Floating Point Arithmetic in Obtaining 64 Bit Accuracy. Proc. ACM/IEEE Conf. on Supercomputing, Article No. 113, p.50.
[13]Linnenbrink, T.E., Gaalema, S.D., 1991. Floating-Point Analog-to-Digital Converter. US Patent 5 061 927.
[14]Seidel, P.M., 2004. On-line IEEE Floating-Point Multiplication and Division for Reduced Power Dissipation. Conf. Record of the 38th Asilomar Conf. on Signals, Systems and Computers, p.498-502.
[15]Spence, I., Scott, N.S., Gillan, C.J., 2009. Enabling science through emerging HPC technologies: accelerating numerical quadrature using a GPU. Numer. Methods Progr., 10:385-388.
[16]Wen, Z., Chen, W., Xu, Z., Wang, J., 2006. Analysis of Two-Phase Stepper Motor Driver Based on FPGA. Proc. IEEE Int. Conf. on Industrial Informatics, p.821-826.
Open peer comments: Debate/Discuss/Question/Opinion
<1>