A HighThroughput Hardware Design of a OneDimensional SPIHT Algorithm
Abstract:
Video display systems include frame memory, which stores video data for display. To reduce system cost, video data are often compressed for storage in frame memory. A desirable characteristic for display memory compression is support for the rasterscan processing order and the fixed target compression ratio. Set partitioning in hierarchical trees (SPIHT) is an efficient twodimensional compression algorithm that guarantees a fixed target compression ratio, but its onedimensional (1D) variation has received little attention, even though its 1D nature supports the rasterscan processing order. This paper proposes a novel hardware design for 1D SPIHT. The algorithm is modified to exploit parallelism for effective hardware implementation. For the encoder, dependences that prohibit parallel execution are resolved and a pipelined schedule is proposed. For the parallel execution of the decoder, the algorithm is modified to enable estimation of the bitstream length of each pass prior to decoding. This modification allows parallel and pipelined decoding operations, leading to a highthroughput design for both encoder and decoder. Although the modifications slightly decrease compression efficiency, additional optimizations are proposed to improve such efficiency. As a result, the peak signaltonoise ratio drop is reduced from 1.40 dB to 0.44 dB. The throughputs of the proposed encoder and decoder are 7.04 Gbps and 7.63 Gbps, respectively, and their respective gate counts are 37.2K and 54.1K. The proposed architecture of this paper analysis the logic size, area and power consumption using Xilinx 14.2.
Enhancement of the project:
Existing System:
To introduce SPIHT, the data structure and the computation flow of the algorithm are explained. The discussion of the results of previous studies shows the challenges involved in hardware implementation of the SPIHT algorithm. The terminologies introduced in this section are used in the remaining sections of this paper. The SPIHT algorithm accepts discrete wavelet transform (DWT) coefficients as input. The DWT coefficients are organized as a binary tree structure on which significance tests are performed. In this binary tree, two children of coefficient ���� are defined as the two coefficients ��_{2}_{��} and ��_{2}_{��}_{+1}. In turn, the children of ��_{2}_{��} are ��_{4}_{��} and ��_{4}_{��}_{+1}. Therefore, ��_{4}_{��}, ��_{4}_{��}_{+1}, ��_{4}_{��}_{+2}, and ��_{4}_{��}_{+3} are also descendants of coefficient ��_{��}. Fig. 1 shows the binary tree structure used in SPIHT, for which the block size is 1×16. Note that DWT coefficients are the input of SPIHT and that its tree structure depends on the decomposition level of the DWT operation.
Figs. 1 (a) and (b) show the structure for three levels, while Figs. 1 (c) and (d) show the structure for two levels. Figs. 1 (a) and (b) show the same coefficients, while the tree structure is shown explicitly in Fig. 1 (b). In both Figs. 1 (a) and (b), the coefficients ��_{0} and ��_{1} are the 0th and the 1st lowpass band coefficients, respectively, of level three. Coefficient ��_{2} is the 0th highpass band coefficient of level three. The correspondences associated with the other coefficients are also presented in Figs. 1 (a) and (b). Half of the coefficients in the lowpass band with a high index form the root(s) of all of the binary trees. For example, in Figs. 1 (a) and (b), ��_{1} is the only root of all of the binary trees, as indicated by the arrows in the figure. In Figs. 1 (c) and (d), ��_{2} and ��_{3} form the roots of the two binary trees. This paper focuses on monochrome image compression so that each color component is compressed with independent binary trees when three color components are being compressed.
Fig. 1. Linear coefficient array and the corresponding binary tree structure. (a) and (b) DWT decomposition level three (c) and (d) DWT decomposition level two. A binary tree is generated independently for each color component.
Disadvantages:

Less throughput and Efficiency is less
Proposed System:
SPIHT ENCODER:
In the proposed design, an earlier 1D SPIHT algorithm is modified to exploit parallelism effectively so as to implement the algorithm in hardware. To this end, dependences in SPIHT are analyzed and a parallel and pipelined scheduling of operations is proposed.
Fig. 3. Pass reorganization by BPS algorithm
Dependence Analysis and Pipeline Schedule
The subsection also describes a modified processing order for the SPIHT algorithm. The modification improves SPIHT hardware utilization by avoiding dependences.
Fig. 3. Dependence analysis and pipeline schedule. (a) dependences among passes (b) an example of which modified bitplanes are processed in parallel
The proposed algorithm processes three passes, RP, SP, and FRP as shown in Fig. 2. There are two dependences that prohibit individual passes from being processed simultaneously (see Fig. 3 (a)). The first dependence exists between the SP and the FRP. This dependence can be avoided by delaying the execution of FRP by one cycle after the SP is executed. The second dependence exists between the SP of the higher pass band and that of the lower pass band. To avoid this dependence, the processing order needs to be changed such that the SP phase of the lower pass band is delayed by one cycle after the SP phase of the higher pass band.
Fig. 3 (b) shows a proposed processing order that delays processing at the lower levels. In that figure, the horizontal axis presents the 16 DWT coefficients at three decomposition levels. The vertical axis presents the bitplanes from the 0th bitplane to the 8th bitplane. The darkness in each region indicates the bitplanes that are processed simultaneously.
Relocation of Sorting Bits
The pipeline schedule proposed in Section III.A allows a fast execution of the SPIHT algorithm without violating dependences. This section presents an encoding scheme that improves compression efficiency without slowing processing speed.
SPIHT DECODER
In that design, the main issues of dependence are resolved. This section proposes a SPIHT algorithm design for decoder hardware implementation. In addition to dependence avoidance, calculation of the starting location for every pass is a challenge to be addressed in the decoder design.
Prior Calculation of Starting Location
An important requirement in decoder design is to determine the starting position of each pass in the bitstream prior to the start of the pass. This is necessary because each pass needs to access the corresponding bitstream for decoding. The description that follows shows that, because it is impossible to obtain the starting position of each pass, it is necessary to modify the bitstream format to allow precalculation of the bitstream length of every pass.
HARDWARE IMPLEMENTATION
The hardware implementation of the proposed encoder and decoder is shown in figure. The organization of the proposed hardware implementation is described in detail, and its performance and cost are compared with previous hardware implementations.
Fig. 4. Block diagram of the proposed 1D SPIHT hardware. (a) encoder (b) decoder
Fig. 4 (a) shows the hardware organization of the proposed 1D SPIHT encoder. The input data for the encoder are DWT coefficients. These are transferred to the Transpose Unit, which reorganizes the data structure of the DWT coefficients for bitplane by bitplane processing in the Encoder Core Unit. The proposed architecture processes a single bitplane per cycle. To this end, the Encoder Core Unit includes 20 processing units: 8 for RP, 6 for FRP, and 6 for SP. The Dummy Bit Address Buffer stores the address of the dummy bits generated by the Encoder Core Unit and the Packetizer Unit. The Dummy Bit Address Buffer size is 160 bits. The List Buffer stores the encoding status among Null, LIP, LSP, LISa, and LISb. The Packetizer Unit combines the bitstreams generated by multiple passes in the predefined format. In addition, it counts the number of dummy bits and uses them to store the bitstream that is coded after reaching the TBL.
The 1D SPIHT decoder organization is shown in Fig. 4 (b). The input bitstream is stored in the Bitstream Buffer, which stores the bitstream of the block to be decoded. The Parser Unit identifies the parts of the bitstream to be processed in individual passes and stores them in the SubBitstream Buffer. The Code Length Prediction Unit and the Address Generation Unit calculate the address of each pass for the bitstream to decode and send this information to the Parser Unit. The Decoder Core Unit processes 20 units. However, as the sign bits that are decoded in the RP and FRP are delayed by a single cycle, 34 units are processed simultaneously. The List Buffer size in the decoder is 384 bits, which is twice as large as that in the encoder because sign bit decoding is delayed by a single cycle. The bits in the dummy bit position are temporarily stored in the Dummy Bit Buffer and are used later when the decoded bitstream length reaches the TBL.
Advantages:
Software implementation:
Further Details Contact: A Vinay 9030333433, 08772261612, 9014123891 #301, 303 & 304, 3rd Floor, AVR Buildings, Opp to SV Music College, Balaji Colony, Tirupati  515702 Email: info@takeoffprojects.com  www.takeoffprojects.com
Share with your friends: 