A high-Throughput Hardware Design of a One-Dimensional spiht algorithm

Download 15.48 Kb.
Size15.48 Kb.

c:\users\naresh.ymts\desktop\verilog material data\takeoff_logo.png

A High-Throughput Hardware Design of a One-Dimensional SPIHT Algorithm


Video display systems include frame memory, which stores video data for display. To reduce system cost, video data are often compressed for storage in frame memory. A desirable characteristic for display memory compression is support for the raster-scan processing order and the fixed target compression ratio. Set partitioning in hierarchical trees (SPIHT) is an efficient two-dimensional compression algorithm that guarantees a fixed target compression ratio, but its one-dimensional (1D) variation has received little attention, even though its 1D nature supports the raster-scan processing order. This paper proposes a novel hardware design for 1D SPIHT. The algorithm is modified to exploit parallelism for effective hardware implementation. For the encoder, dependences that prohibit parallel execution are resolved and a pipelined schedule is proposed. For the parallel execution of the decoder, the algorithm is modified to enable estimation of the bitstream length of each pass prior to decoding. This modification allows parallel and pipelined decoding operations, leading to a high-throughput design for both encoder and decoder. Although the modifications slightly decrease compression efficiency, additional optimizations are proposed to improve such efficiency. As a result, the peak signal-to-noise ratio drop is reduced from 1.40 dB to 0.44 dB. The throughputs of the proposed encoder and decoder are 7.04 Gbps and 7.63 Gbps, respectively, and their respective gate counts are 37.2K and 54.1K. The proposed architecture of this paper analysis the logic size, area and power consumption using Xilinx 14.2.

Enhancement of the project:

Existing System:

To introduce SPIHT, the data structure and the computation flow of the algorithm are explained. The discussion of the results of previous studies shows the challenges involved in hardware implementation of the SPIHT algorithm. The terminologies introduced in this section are used in the remaining sections of this paper. The SPIHT algorithm accepts discrete wavelet transform (DWT) coefficients as input. The DWT coefficients are organized as a binary tree structure on which significance tests are performed. In this binary tree, two children of coefficient ���� are defined as the two coefficients ��2�� and ��2��+1. In turn, the children of ��2�� are ��4�� and ��4��+1. Therefore, ��4��, ��4��+1, ��4��+2, and ��4��+3 are also descendants of coefficient ����. Fig. 1 shows the binary tree structure used in SPIHT, for which the block size is 1×16. Note that DWT coefficients are the input of SPIHT and that its tree structure depends on the decomposition level of the DWT operation.

Figs. 1 (a) and (b) show the structure for three levels, while Figs. 1 (c) and (d) show the structure for two levels. Figs. 1 (a) and (b) show the same coefficients, while the tree structure is shown explicitly in Fig. 1 (b). In both Figs. 1 (a) and (b), the coefficients ��0 and ��1 are the 0th and the 1st low-pass band coefficients, respectively, of level three. Coefficient ��2 is the 0th high-pass band coefficient of level three. The correspondences associated with the other coefficients are also presented in Figs. 1 (a) and (b). Half of the coefficients in the low-pass band with a high index form the root(s) of all of the binary trees. For example, in Figs. 1 (a) and (b), ��1 is the only root of all of the binary trees, as indicated by the arrows in the figure. In Figs. 1 (c) and (d), ��2 and ��3 form the roots of the two binary trees. This paper focuses on monochrome image compression so that each color component is compressed with independent binary trees when three color components are being compressed.

Fig. 1. Linear coefficient array and the corresponding binary tree structure. (a) and (b) DWT decomposition level three (c) and (d) DWT decomposition level two. A binary tree is generated independently for each color component.


  • Less throughput and Efficiency is less

Proposed System:


In the proposed design, an earlier 1D SPIHT algorithm is modified to exploit parallelism effectively so as to implement the algorithm in hardware. To this end, dependences in SPIHT are analyzed and a parallel and pipelined scheduling of operations is proposed.

Fig. 3. Pass reorganization by BPS algorithm

Dependence Analysis and Pipeline Schedule

The subsection also describes a modified processing order for the SPIHT algorithm. The modification improves SPIHT hardware utilization by avoiding dependences.

Fig. 3. Dependence analysis and pipeline schedule. (a) dependences among passes (b) an example of which modified bit-planes are processed in parallel

The proposed algorithm processes three passes, RP, SP, and FRP as shown in Fig. 2. There are two dependences that prohibit individual passes from being processed simultaneously (see Fig. 3 (a)). The first dependence exists between the SP and the FRP. This dependence can be avoided by delaying the execution of FRP by one cycle after the SP is executed. The second dependence exists between the SP of the higher pass band and that of the lower pass band. To avoid this dependence, the processing order needs to be changed such that the SP phase of the lower pass band is delayed by one cycle after the SP phase of the higher pass band.

Fig. 3 (b) shows a proposed processing order that delays processing at the lower levels. In that figure, the horizontal axis presents the 16 DWT coefficients at three decomposition levels. The vertical axis presents the bit-planes from the 0th bit-plane to the 8th bit-plane. The darkness in each region indicates the bit-planes that are processed simultaneously.

Relocation of Sorting Bits

The pipeline schedule proposed in Section III.A allows a fast execution of the SPIHT algorithm without violating dependences. This section presents an encoding scheme that improves compression efficiency without slowing processing speed.


In that design, the main issues of dependence are resolved. This section proposes a SPIHT algorithm design for decoder hardware implementation. In addition to dependence avoidance, calculation of the starting location for every pass is a challenge to be addressed in the decoder design.

Prior Calculation of Starting Location

An important requirement in decoder design is to determine the starting position of each pass in the bitstream prior to the start of the pass. This is necessary because each pass needs to access the corresponding bitstream for decoding. The description that follows shows that, because it is impossible to obtain the starting position of each pass, it is necessary to modify the bitstream format to allow pre-calculation of the bitstream length of every pass.


The hardware implementation of the proposed encoder and decoder is shown in figure. The organization of the proposed hardware implementation is described in detail, and its performance and cost are compared with previous hardware implementations.

Fig. 4. Block diagram of the proposed 1D SPIHT hardware. (a) encoder (b) decoder

Fig. 4 (a) shows the hardware organization of the proposed 1D SPIHT encoder. The input data for the encoder are DWT coefficients. These are transferred to the Transpose Unit, which reorganizes the data structure of the DWT coefficients for bitplane by bit-plane processing in the Encoder Core Unit. The proposed architecture processes a single bit-plane per cycle. To this end, the Encoder Core Unit includes 20 processing units: 8 for RP, 6 for FRP, and 6 for SP. The Dummy Bit Address Buffer stores the address of the dummy bits generated by the Encoder Core Unit and the Packetizer Unit. The Dummy Bit Address Buffer size is 160 bits. The List Buffer stores the encoding status among Null, LIP, LSP, LISa, and LISb. The Packetizer Unit combines the bitstreams generated by multiple passes in the predefined format. In addition, it counts the number of dummy bits and uses them to store the bitstream that is coded after reaching the TBL.

The 1D SPIHT decoder organization is shown in Fig. 4 (b). The input bitstream is stored in the Bitstream Buffer, which stores the bitstream of the block to be decoded. The Parser Unit identifies the parts of the bitstream to be processed in individual passes and stores them in the Sub-Bitstream Buffer. The Code Length Prediction Unit and the Address Generation Unit calculate the address of each pass for the bitstream to decode and send this information to the Parser Unit. The Decoder Core Unit processes 20 units. However, as the sign bits that are decoded in the RP and FRP are delayed by a single cycle, 34 units are processed simultaneously. The List Buffer size in the decoder is 384 bits, which is twice as large as that in the encoder because sign bit decoding is delayed by a single cycle. The bits in the dummy bit position are temporarily stored in the Dummy Bit Buffer and are used later when the decoded bitstream length reaches the TBL.


Software implementation:

  • Modelsim

  • Xilinx ISE

Further Details Contact: A Vinay 9030333433, 08772261612, 9014123891 #301, 303 & 304, 3rd Floor, AVR Buildings, Opp to SV Music College, Balaji Colony, Tirupati - 515702 Email: info@takeoffprojects.com | www.takeoffprojects.com

Download 15.48 Kb.

Share with your friends:

The database is protected by copyright ©ininet.org 2023
send message

    Main page