Low-Power fpga design Using Memoization-Based Approximate Computing

Download 10.68 Kb.
Size10.68 Kb.

c:\users\naresh.ymts\desktop\verilog material data\takeoff_logo.png

Low-Power FPGA Design Using Memoization-Based Approximate Computing


Field-programmable gate arrays (FPGAs) are increasingly used as the computing platform for fast and energy efficient execution of recognition, mining, and search applications. Approximate computing is one promising method for achieving energy efficiency. Compared with most prior works on approximate computing, which target approximate processors and arithmetic blocks, this paper presents an approximate computing methodology for FPGA-based design. It studies memoization as a method for approximation on FPGA and analyzes different architectural and design parameters that should be considered. The proposed design flow leverages on high-level synthesis to enable memoization-based micro-architecture generation, thus also facilitating a C-to-register-transfer-level synthesis. When compared with the previous approaches of bit-width truncation and approximate multipliers, memoization-based approximate computation on FPGA achieves a significant dynamic power saving (around 20%) with very small area overhead (< 5%) and better power-to-signal noise ratio values for the studied image processing benchmarks. The proposed architecture of this paper analysis the logic size, area and power consumption using Xilinx 14.2.

Enhancement of the project:

increase the length of the input data.

Existing System:

APPROXIMATE computing has been proposed as an alternative to exact computing for power reduction in embedded computing systems. It assumes that the applications under investigation can tolerate approximate results, and hence, exact computation becomes unnecessary. Some application examples are data-mining, search, analytics, and media processing (audio and video), which are collectively referred to as the class of recognition, mining, and search (RMS) applications. Approximation can help to reduce the computation efforts, resulting in lower power consumption. In fact, energy efficiency is one of the main driving forces behind approximate computing. The reported power saving is normally within 5%–40% depending on the application, the approximation technique, and the acceptable error tolerance. Approximate computing has been studied at different levels, including processors, language design, and ASIC-style computational blocks, such as imprecise adders. However, there has been only a few previous works that investigate approximate computing on field programmable gate array (FPGA) as the computing substrate, although it is widely used to accelerate RMS applications.

We design our memoization-based architecture generation flow by adding a post processing step to existing HLS flows. Fig. 1 shows an overview of the memoization-based architecture generation flow. It shows snippets of code from a C-language source file, where a function named edge_detect has been identified for HLS. The configuration file also takes as input other important parameters, such as similarity measure, threshold, and so on.

Fig. 1. Memoization-based architecture generation flow.

HLS is a design methodology that converts abstract descriptions of an algorithm in a high level language, such as C, into digital micro-architecture.


Proposed System:

We assume an iterative design flow for memoization based approximate computing. The details of this iterative design flow are shown in Fig. 2. Here, P1 and P2 refer to the power values obtained without memoization and with memoization, respectively; R1 and R2 refer to the computed values obtained without memoization and with memoization, respectively; P and T are the power and result accuracy thresholds, respectively. Fig. 2 shows not only the memoization architecture generation flow but also the considerations related to power and accuracy of results that must be considered. The red block shows that an application or task described in C/C++ language is synthesized using an HLS tool. The blue block shows memoization architecture generator, which generates the RTL wrapper module to wrap the HLS synthesized block with memoization related circuit blocks. As a result of this wrapping, the RTL design of memoized architecture is generated (purple block), i.e., the top-level module which contains the RTL wrapper and the HLS synthesized block. After placement and routing on target FPGA using a vendor specific placement and routing tool (such as Xilinx ISE), the simulation-based dynamic power analysis of both the HLS synthesized design and the memoized architecture is performed separately to evaluate the potential power saving (green block). The power analysis is performed using the data set corresponding to the application (strong or weak). The percentage difference between P1 and P2 should be greater than user-defined threshold P compared with the area overhead due to the wrapper.

Fig. 2. Proposed iterative design flow for memoization-based approximate computing.

Memoization Architecture Generator:

The memoization architecture generator contains a memoization wrapper generator that generates the architectural blocks needed for memoization. The wrapper generator architecture is shown in Fig. 3.

Fig. 3. Architecture of memoization wrapper generator


We will introduce the detailed architecture for static memoization and dynamic memoization. We will also discuss the choice of memory resource.

Static Memoization Architecture:

Fig. 4 shows the architecture for static memoization. An RTL wrapper wraps an HLS synthesized block which implements the computational task. The similarity measure of an input vector with a reference vector is calculated in the similarity measure calculation block inside the RTL wrapper.

Dynamic Memoization Architecture:

The architecture for dynamic memoization is shown in Fig. 5. The RTL wrapper in this case is more complex, because the dynamic memoization is used for weak data set applications, where the condition for memoization, the output values, and so on are all determined at run time. Both the write and read operations need to be performed in the memory resource. The main difference from the static memoization architecture is highlighted in red block, as shown in Fig. 5.

Fig. 5. Dynamically memoized architecture.


  • Save the dynamic power dissipation

  • Reduce the area

Software implementation:

  • Modelsim

  • Xilinx ISE

Further Details Contact: A Vinay 9030333433, 08772261612, 9014123891 #301, 303 & 304, 3rd Floor, AVR Buildings, Opp to SV Music College, Balaji Colony, Tirupati - 515702 Email: info@takeoffprojects.com | www.takeoffprojects.com

Download 10.68 Kb.

Share with your friends:

The database is protected by copyright ©ininet.org 2024
send message

    Main page