Abstract In this paper we present the design of fast Fourier transform (FFT) for digital signal processing (DSP) applications. FFT is a basic transformation for coding method which converts spatial domain to frequency domain of signal. In Rdix2 FFT operation addition, subtraction, multiplication operations are required. These operations must be accurate, less latency. Floating point operations have dynamic range of representation, and more accurate. So the floating point operations are used for the above operations. In this floating point arithmetic unit multiplication is the most complex operation consists of many operations which will be suboperations in addition. Here high radix booth multiplier is used for mantissa multiplication to reduce the complexity. The fused operations are a twoterm dot product and an addsubtract unit. The FFT operations that consist of multiplications, additions, and subtractions of complex valued data. It will be implemented efficiently with the two fused floatingpoint operations with IEEE 754 single precision standard for high accuracy. Finally the efficiency of proposed fused unit will be proved through hardware synthesize.
Keywords Floatingpoint arithmetic, fused floatingpoint operations, fast Fourier transform, Radix2 FFT butterfly.

INTRODUCTION
Digital arithmetic unit operations are very important in the design of digital processors and applicationspecific systems [3]. Arithmetic circuits form an important class of circuits in digital systems [2]. With the remarkable progress in the very large scale integration (VLSI) circuit technology, many complex circuits, unthinkable yesterday have become easily realizable today. Algorithms that seemed impossible to implement now have attractive implementation possibilities for the future [12]. This means that not only the conventional computer arithmetic methods, but also the unconventional ones are worth investigation in new designs. [6]
The notion of real numbers in mathematics is convenient for hand computations and formula manipulations. However, real numbers are not wellsuited for general purpose computation, because their numeric representation as a string of digits expressed in, say, base 10 can be very long or even infinitely long. Examples include π, e, and 1/3. In practice, computers store numbers with finite precision. Numbers and arithmetic used in scientific computation should meet a few general criteria:
In this thesis an arithmetic unit based on IEEE standard for floating point numbers has been implemented on ALTERA cyclone III FPGA Board. The arithmetic unit implemented has a 64bit processing unit which allows various arithmetic operations such as, Addition, Subtraction, Multiplication, Division and, on floating point numbers. Each operation can be selected by a particular operation code. Synthesis of the unit for the FPGA board has been done using QUARTUSII.
Implemented arithmetic unit is a part of a computer system specially designed to carry out operations on floating point numbers. Some systems (particularly older, microcodebased architectures) can also perform various transcendental functions such as exponential or trigonometric calculations, though in most modern processors these are done with software library routines. In most modern general purpose computer architectures, one or more FPUs are integrated with the CPU; however many embedded processors, especially older designs, do not have hardware support for floatingpoint operations. In the past, some systems have implemented floating point via a coprocessor rather than as an integrated unit; in the microcomputer era, this was generally a single microchip, while in older systems it could be an entire circuit board or a cabinet. Not all computer architectures have hardware FPU. In the absence of an FPU, many FPU functions can be emulated. [7].
I. FAST FOURIER TRANSFORM PROCESSOR
The basic architecture of a radixr pipeline FFT processor is well known [5]. The basic structure. It consists of an interleaved cascade of two types of elements. These are radixr computational elements (identified as CE on the figure) and data reordering elements [6] (identified as RE on the figure). The data flow is in one direction along the heavy lines, which convey complex data. The light lines carry r _ 1 complex twiddle factors from the TF units (identified as TF on the figure) to the computational elements. The twiddle factors may be computed or may be obtained from a memory. In the data flow, th computational element is used first, then the two types of elements (RE and CE) alternate ending with a computational element. There are two types of computational elements (also called butterfly units). Decimation In Time (DIT) computational elements consist of complex multiplication(s) followed by a sum and difference network. Decimation in Frequency (DIF) computational elements consist of a sum and difference network followed by complex multiplication(s).

IEEE 754 Standard For Binary FloatingPoint Arithmetic Unit
The IEEE (Institute of Electrical and Electronics Engineers) has produced a Standard to define floatingpoint representation and arithmetic The standard brought out by the IEEE come to be known as IEEE 754. The IEEE 754 Standard for FloatingPoint Arithmetic is the most widelyused standard for floatingpoint computation, and is followed by many hardware (CPU and FPU) and software implementations. Many computer languages allow or require that some or all arithmetic be carried out using IEEE 754 formats and operations. The current version is IEEE 7542008, which was published in August 2008;it includes nearly all of the original IEEE 7541985 (which was published in 1985) and the IEEE Standard for RadixIndependent FloatingPoint Arithmetic unit (IEEE 8541987). [9].

RADIX2 FFT BUTTERFLY
To illustrate the utility of the Fused DP and Fused AS units for FFT implementation, FFT butterfly unit designs using both the discrete and the fused units have been made. First, a radix2 decimation in frequency FFT butterfly was designed. All lines carry complex pairs of 32bit IEEE754 numbers and all operations are complex. The complex add, subtract, and multiply operations can be realized with a discrete implementation that uses two real adders to perform the complex add or subtract and four real multipliers and two real adders to perform the complex multiply. The complete butterfly consists of six real adders and four real multipliers as shown on the figure. In this and the following figure, all lines are 32bits wide for the IEEE754 singleprecision data. The complex add and subtract can be performed with two fused addsubtract units (marked as FAS in the figure) and the complex multiplication can be realized with two fused dot product units (marked as FDP). These results are obtained from having done a complete layout for the butterfly unit. Thus, the areas are greater than the sum of the parts (i.e., four multipliers and six adders for the discrete version or two Fused DP and two Fused AS units for the fused version). [11].
Fig 1 Block diagram of the proposed scaling algorithm
In addition, the Fused DP unit provides slightly more accurate results because only one rounding operation is performed compared to the three rounding operations (one at the output of each multiplier and the other at the output of the adder) of the discrete implementation.
Fig 2 Fused implementation of the radix2 DIF FFT butterfly.

PERFORMANCE RESULTS
Hardware requirement and cost of DSP processor is less as only Radix2 floating point addsubtract dotproduct unit adders. To implement proposed fused FPU in FFT radix 2 codec with high accuracy level, is minimum as hardware complexity is greatly reduced compared to other processors. And finally we synthesized the successfully verified scheme with QUARTUS II EDA tool.
Fig 3. Simulated output
Fig 4 RTL view
TABLE II
Methods

Fmax(MHz)

Existing method

89.29.MHz

Proposed method

154.06MHz

Efficiency of radix2 butterfly
V. CONCLUSION
Hardware requirement and cost of DSP processor is less as only Radix2 floating point addsubtract dotproduct unit adders. To implement proposed fused FPU in FFT radix 2 codec with high accuracy level, is minimum as hardware complexity is greatly reduced compared to other processors. And finally we synthesized the successfully verified scheme with QUARTUS II EDA tool.
REFERENCES
[1] B. Gold and T. Bially, “Parallelism in Fast Fourier Transform Hardware,” IEEE Trans. Audio and Electro acoustics, vol. AU21, no. 1, pp. 516, Feb. 1973.
[2] D. Takahashi, “A Radix16 FFT Algorithm Suitable for MultiplyAdd Instruction Based on Goedecker Method,” Proc. Int’l Conf. Multimedia and Expo, vol. 2, pp. II845II848, July 2003.
[3] E. Hokenek, R.K. Montoye, and P.W. Cook, “SecondGeneration RISC Floating Point with MultiplyAdd Fused,” IEEE J. SolidState Circuits, vol. 25, no. 5, pp. 12071213, Oct. 1990.
[4] Flores, Ivan, the Logic of Computer Arithmetic, Prentice Hall, Inc. Englewood Cliff (N.J.).
[5] G. Todorov, BASIC Design, Implementation and Analysis of a Scalable Highradix Montgomery Multiplier,” Master Thesis, Oregon State University, USA, December 2000.
[6] H.H. Saleh, “Fused FloatingPoint Arithmetic for DSP,” PhD dissertation,
Univ. of Texas, 2008. 288 IEEE TRANSACTIONS ON COMPUTERS, VOL. 61, NO.2, FEBRUARY 2012.
[7] H. Saleh and E.E. Swartzlander, Jr., “A FloatingPoint Fused AddSubtract
Unit,” Proc. IEEE Midwest Symp. Circuits and Systems (MWSCAS), pp. 519
522, 2008.
[8] H.H. Saleh and E.E. Swartzlander, Jr., “A FloatingPoint Fused DotProduct Unit,” Proc. IEEE Int’l Conf. Computer Design (ICCD), pp. 427431, 2008.
[9] IEEE Standard for FloatingPoint Arithmetic, ANSI/IEEE Standard 7542008,Aug.2008.
[10] J.H. McClellan and R.J. Purdy, “Applications of Digital Signal Processing to Radar,” Applications of Digital Signal Processing, A.V. Oppenheim, ed., pp. 239329, PrenticeHall, 1978.
[11] Kari Kalliojarvi and Yrjo Neovo, “Distribution of Round off Noise in Binary Floating  Point Addition”, Proceedings of ISCAS92, pp. 17961799.
[12] M. Schulte, E. Swartzlander, “Hardware Design and ArithmeticAlgorithms for a VariablePrecision, Interval Arithmetic Coprocessor”, Proc. IEEE 12th Symposium on
Computer Arithmetic (ARITH12), pp 222230, 1995