Performance Analysis of High Speed Floating Point Arithmetic Unit for dsp processor

Download 25.35 Kb.
Size25.35 Kb.
Performance Analysis of High Speed Floating Point Arithmetic Unit for DSP Processor
D.Narasingarao#1, V.Balamurugan*2

#1 (VLSI design) Department of ECE, Sathyabama University, Chennai, India

*3Asst professor, Department of ECE, Sathyabama University, Chennai, India

Abstract- In this paper we present the design of fast Fourier transform (FFT) for digital signal processing (DSP) applications. FFT is a basic transformation for coding method which converts spatial domain to frequency domain of signal. In Rdix-2 FFT operation addition, subtraction, multiplication operations are required. These operations must be accurate, less latency. Floating point operations have dynamic range of representation, and more accurate. So the floating point operations are used for the above operations. In this floating point arithmetic unit multiplication is the most complex operation consists of many operations which will be sub-operations in addition.  Here high radix booth multiplier is used for mantissa multiplication to reduce the complexity. The fused operations are a two-term dot product and an add-subtract unit. The FFT operations that consist of multiplications, additions, and subtractions of complex valued data. It will be implemented efficiently with the two fused floating-point operations with IEEE 754 single precision standard for high accuracy. Finally the efficiency of proposed fused unit will be proved through hardware synthesize.

Keywords- Floating-point arithmetic, fused floating-point operations, fast Fourier transform, Radix-2 FFT butterfly.


Digital arithmetic unit operations are very important in the design of digital processors and application-specific systems [3]. Arithmetic circuits form an important class of circuits in digital systems [2]. With the remarkable progress in the very large scale integration (VLSI) circuit technology, many complex circuits, unthinkable yesterday have become easily realizable today. Algorithms that seemed impossible to implement now have attractive implementation possibilities for the future [12]. This means that not only the conventional computer arithmetic methods, but also the unconventional ones are worth investigation in new designs. [6]

The notion of real numbers in mathematics is convenient for hand computations and formula manipulations. However, real numbers are not well-suited for general purpose computation, because their numeric representation as a string of digits expressed in, say, base 10 can be very long or even infinitely long. Examples include π, e, and 1/3. In practice, computers store numbers with finite precision. Numbers and arithmetic used in scientific computation should meet a few general criteria:

In this thesis an arithmetic unit based on IEEE standard for floating point numbers has been implemented on ALTERA cyclone -III FPGA Board. The arithmetic unit implemented has a 64-bit processing unit which allows various arithmetic operations such as, Addition, Subtraction, Multiplication, Division and, on floating point numbers. Each operation can be selected by a particular operation code. Synthesis of the unit for the FPGA board has been done using QUARTUS-II.

Implemented arithmetic unit is a part of a computer system specially designed to carry out operations on floating point numbers. Some systems (particularly older, microcode-based architectures) can also perform various transcendental functions such as exponential or trigonometric calculations, though in most modern processors these are done with software library routines. In most modern general purpose computer architectures, one or more FPUs are integrated with the CPU; however many embedded processors, especially older designs, do not have hardware support for floating-point operations. In the past, some systems have implemented floating point via a coprocessor rather than as an integrated unit; in the microcomputer era, this was generally a single microchip, while in older systems it could be an entire circuit board or a cabinet. Not all computer architectures have hardware FPU. In the absence of an FPU, many FPU functions can be emulated. [7].


The basic architecture of a radix-r pipeline FFT processor is well known [5]. The basic structure. It consists of an interleaved cascade of two types of elements. These are radix-r computational elements (identified as CE on the figure) and data reordering elements [6] (identified as RE on the figure). The data flow is in one direction along the heavy lines, which convey complex data. The light lines carry r _ 1 complex twiddle factors from the TF units (identified as TF on the figure) to the computational elements. The twiddle factors may be computed or may be obtained from a memory. In the data flow, th computational element is used first, then the two types of elements (RE and CE) alternate ending with a computational element. There are two types of computational elements (also called butterfly units). Decimation In Time (DIT) computational elements consist of complex multiplication(s) followed by a sum and difference network. Decimation in Frequency (DIF) computational elements consist of a sum and difference network followed by complex multiplication(s).

  1. IEEE 754 Standard For Binary Floating-Point Arithmetic Unit

The IEEE (Institute of Electrical and Electronics Engineers) has produced a Standard to define floating-point representation and arithmetic The standard brought out by the IEEE come to be known as IEEE 754. The IEEE 754 Standard for Floating-Point Arithmetic is the most widely-used standard for floating-point computation, and is followed by many hardware (CPU and FPU) and software implementations. Many computer languages allow or require that some or all arithmetic be carried out using IEEE 754 formats and operations. The current version is IEEE 754-2008, which was published in August 2008;it includes nearly all of the original IEEE 754-1985 (which was published in 1985) and the IEEE Standard for Radix-Independent Floating-Point Arithmetic unit (IEEE 854-1987). [9].


To illustrate the utility of the Fused DP and Fused AS units for FFT implementation, FFT butterfly unit designs using both the discrete and the fused units have been made. First, a radix-2 decimation in frequency FFT butterfly was designed. All lines carry complex pairs of 32-bit IEEE-754 numbers and all operations are complex. The complex add, subtract, and multiply operations can be realized with a discrete implementation that uses two real adders to perform the complex add or subtract and four real multipliers and two real adders to perform the complex multiply. The complete butterfly consists of six real adders and four real multipliers as shown on the figure. In this and the following figure, all lines are 32-bits wide for the IEEE-754 single-precision data. The complex add and subtract can be performed with two fused add-subtract units (marked as FAS in the figure) and the complex multiplication can be realized with two fused dot product units (marked as FDP). These results are obtained from having done a complete layout for the butterfly unit. Thus, the areas are greater than the sum of the parts (i.e., four multipliers and six adders for the discrete version or two Fused DP and two Fused AS units for the fused version). [11].

Fig 1 Block diagram of the proposed scaling algorithm

In addition, the Fused DP unit provides slightly more accurate results because only one rounding operation is performed compared to the three rounding operations (one at the output of each multiplier and the other at the output of the adder) of the discrete implementation.

Fig 2 Fused implementation of the radix-2 DIF FFT butterfly.


Hardware requirement and cost of DSP processor is less as only Radix-2 floating point add-subtract dot-product unit adders. To implement proposed fused FPU in FFT radix -2 codec with high accuracy level, is minimum as hardware complexity is greatly reduced compared to other processors. And finally we synthesized the successfully verified scheme with QUARTUS II EDA tool.

Fig 3. Simulated output

Fig 4 RTL view




Existing method


Proposed method

Efficiency of radix-2 butterfly

Hardware requirement and cost of DSP processor is less as only Radix-2 floating point add-subtract dot-product unit adders. To implement proposed fused FPU in FFT radix -2 codec with high accuracy level, is minimum as hardware complexity is greatly reduced compared to other processors. And finally we synthesized the successfully verified scheme with QUARTUS II EDA tool.

[1] B. Gold and T. Bially, “Parallelism in Fast Fourier Transform Hardware,” IEEE Trans. Audio and Electro acoustics, vol. AU-21, no. 1, pp. 5-16, Feb. 1973.

[2] D. Takahashi, “A Radix-16 FFT Algorithm Suitable for Multiply-Add Instruction Based on Goedecker Method,” Proc. Int’l Conf. Multimedia and Expo, vol. 2, pp. II-845-II-848, July 2003.

[3] E. Hokenek, R.K. Montoye, and P.W. Cook, “Second-Generation RISC Floating Point with Multiply-Add Fused,” IEEE J. Solid-State Circuits, vol. 25, no. 5, pp. 1207-1213, Oct. 1990.

[4] Flores, Ivan, the Logic of Computer Arithmetic, Prentice Hall, Inc. Englewood Cliff (N.J.).

[5] G. Todorov, BASIC Design, Implementation and Analysis of a Scalable High-radix Montgomery Multiplier,” Master Thesis, Oregon State University, USA, December 2000.

[6] H.H. Saleh, “Fused Floating-Point Arithmetic for DSP,” PhD dissertation,

Univ. of Texas, 2008. 288 IEEE TRANSACTIONS ON COMPUTERS, VOL. 61, NO.2, FEBRUARY 2012.

[7] H. Saleh and E.E. Swartzlander, Jr., “A Floating-Point Fused Add-Subtract

Unit,” Proc. IEEE Midwest Symp. Circuits and Systems (MWSCAS), pp. 519-

522, 2008.

[8] H.H. Saleh and E.E. Swartzlander, Jr., “A Floating-Point Fused Dot-Product Unit,” Proc. IEEE Int’l Conf. Computer Design (ICCD), pp. 427-431, 2008.

[9] IEEE Standard for Floating-Point Arithmetic, ANSI/IEEE Standard 754-2008,Aug.2008.

[10] J.H. McClellan and R.J. Purdy, “Applications of Digital Signal Processing to Radar,” Applications of Digital Signal Processing, A.V. Oppenheim, ed., pp. 239-329, Prentice-Hall, 1978.

[11] Kari Kalliojarvi and Yrjo Neovo, “Distribution of Round off Noise in Binary Floating - Point Addition”, Proceedings of ISCAS92, pp. 1796-1799.

[12] M. Schulte, E. Swartzlander, “Hardware Design and ArithmeticAlgorithms for a Variable-Precision, Interval Arithmetic Coprocessor”, Proc. IEEE 12th Symposium on

Computer Arithmetic (ARITH-12), pp 222-230, 1995

Download 25.35 Kb.

Share with your friends:

The database is protected by copyright © 2022
send message

    Main page