Design Techniques and Challenges:EBCOT Tier-1 Algorithm in JPEG 2000

EBCOT Tier-1 Algorithm in JPEG 2000

As shown in Figure 81.1, EBCOT tier-1 algorithm [1] is divided into two phases: BC phase and adaptive AE phase. BC phase uses the neighbor information of the current bit to construct context information, i.e., context (CX) and decision (D). Following BC, AE uses CX to estimate the probability of occurrence of D. With this probability, AE adaptively encodes the decision (D) bit by bit to achieve high coding efficiency.

Bit-plane Coding

The DWT coefficients in a code-block memory are stored as binary numbers. Each binary number can be seen as a sequence of bits with position index from n to 1, where n is the number of bits and the binary number has n bits. All the bits at the same position compose a bit-plane. For example, all the least significant bits (LSBs) of the DWT coefficients in a code-block compose a bit-plane because they are on the same position 1. Each bit plane is divided into stripes that are continuous four rows of a bit- plane, i.e., a bit-plane with size N X N should have N/4 stripes. So one column of a stripe has four bits. Bit-plane, stripe, column, and bit are associated with coding order (pass). There are three passes in bit- plane coding phase: significant pass (pass 1), magnitude refinement pass (pass 2) and clean-up pass (pass 3). Each pass should scan a bit-plane in order: bit by bit, column by column, and stripe by stripe. Then a code block is scanned bit-plane by bit-plane. Figure 81.2 provides a topview (Figure 81.2[a]) and a sideview (Figure 81.2[b]) for a code block and scanning order of passes.

VLSI Architectures for JPEG 2000 EBCOT-Design Techniques and Challenges-0071

A DWT coefficient is associated with a significance state (SigState). Before starting the three passes, all the SigStates of a code-block are initialized as insignificant. During pass scanning, the SigState of a wavelet coefficient is changed from the insignificant state to the significant one whenever the most significant bit (MSB) of the wavelet coefficient is coded. So all the bits of a DWT coefficient before MSB coding are considered insignificant for all the three passes and all the bits of a DWT coefficient after MSB coding are considered significant for all the three passes. As shown in Figure 81.3, each bit has 8 neighbors, i.e., d1 – d4, v1, v2, h1, h2, where d, h, v stand for diagonal bits, horizontal bits, and vertical bits, respectively. The SigState of one bit and its 8 neighbors determine which pass a bit is coded in. If one bit’s SigState is insignificant, the bit is insignificant. Otherwise, the bit is significant. If a bit is insignificant and any neighbor is significant, the bit is coded in pass 1. If a bit is significant, the bit is coded in pass 2. The rest of all the bits are coded in pass 3. During the scanning of the three passes, any bit can only be coded in one of three passes (pass 1, pass 2, or pass 3). Besides, the SigState of one bit and its 8 neighbors are used to calculate both CX and D through primitive encoders. Primitive encoders use neighbor contributions (NC) (D, H, V) that are calculated by using the formula, D = d1 + d2 + d3 + d4; H = h1 + h2; V = v1 + v2.

AE Algorithm

The BC phase provides AE phase with the context information (CX and D). CX represents the significance information for one bit and its neighbors. Depending on the bit’s location in a bit-plane, the bit and its neighbors may have various SigState. So 19 types of CX are used to indicate the different combinations of these SigStates. D is 1-bit symbol (0 or 1) that could be the current bit value or the binary results from run-length coding. CX is used to adaptively tune the probability at which a certain D (0 or 1) occurs under the CX-indexed neighbor significance conditions. By using this probability, D is encoded into codeword. The coding process goes as follows. Initially, the current probability interval is initialized to 1. Then, the current probability interval is partitioned into two subintervals: more probable subinterval (MPS) and less probable subinterval (LPS). One of the subintervals will be selected to initialize the current probability interval for the next iteration and the codeword is modified so that it points to the base of the selected probability subinterval; in other words, the codeword is represented by the base of the selected probability subinterval. Since the coding process involves addition of probability subinterval (binary fraction) rather than concatenation of binary decision, the more probable binary decisions can often be coded at a cost of much less than one bit per decision. The accuracy to estimate the probability of occurrence of D directly affects AE coding efficiency. To adaptively estimate the probability at which D takes place, a finite-state machine (FSM) is defined in JPEG 2000 standard. The FSM defines an estimated probability for each state, the transition from one state to the next state, and how to alternate the current MPS status. Even though only one FSM is defined, each type of CX has its own FSM instance. Only CX in the same type shares one FSM instance and can affect states of the FSM instance. So totally, 19 FSM instances are needed for all CXs.

The AE algorithm described above is shown in more detail in Figure 81.4, where A is the current interval size, C the codeword, Qe the estimated probability, MPS the current MPS status and ^ means XOR operation. Probability (Qe) and MPS status are searched in FSM by CX indexing. Qe and MPS status are used to update A and C. After A and C are updated, renormalization procedure is executed, i.e., left- shifting both A and C until the MSB of A is located in the leftmost position. During renormalization procedure, the bits shifted out are stored in a buffer and the number of bit shifting operations is counted by a counter. Whenever the counter is counted down to 0, the counter restores its initial value, the byte- out procedure is called, and a byte is outputted with bit-stuffing techniques. Then the next CX is fetched and processed using the same procedure above.

AE Termination

Since A and C accumulate a history of all the coding bits, AE is a strictly serial process. Hence, AE termination is used to remove the dependency of two bit-streams to favor error-resilient bit-stream. By using various AE terminations, EBCOT tier-1 coding can be processed in either serial pass mode (SeriPM) or parallel pass mode (ParaPM). If coding in SeriPM, three passes scan a bit-plane in serial. After all the bit planes of a code-block are scanned, AE terminates the bit stream. If coding in ParaPM, three passes scan a code- block in a manner similar to the serial mode. But AE terminates the bit stream at the end of each pass to reduce the dependency among passes. To remove the dependency of the current stripe coding on the next stripe, stripe-causal mode is adopted, i.e., the NC from the next stripe are always considered as 0. This termination pattern and stripe-causal mode remove the dependency between passes completely. As a result, ParaPM may allow three passes to scan a bit-plane simultaneously, since the AE probability intervals of three passes do not depend on each other. The parallel mode is more error-resilient, although its performance is slightly poorer than the serial mode (average degradation of PSNR is only 0.19 dB) [10,11].

ParaPM versus SeriPM

In ParaPM, three passes scan a bit-plane in parallel. In SeriPM, three passes scan a bit-plane in serial. Figure 81.5 shows pseudocodes for the coding order and illustrates the difference between ParaPM and SeriPM.

VLSI Architectures for JPEG 2000 EBCOT-Design Techniques and Challenges-0072

ParaPM provides with more parallelism, since three passes could be coded simultaneously instead of a serial process of three passes. ParaPM is also more error-resilient. That is an important characteristic in wireless application. In ParaPM, AE terminates each pass on each bit-plane, while in SeriPM AE terminates when a code-block has finished coding. Therefore, in ParaPM, errors can only spread in smaller space (one pass) instead of one code-block in SeriPM once errors occur in a coding stream. However, all these advantages accrue in ParaPM at the cost of degradation of PSNR performance.

To evaluate this degradation of PSNR performance, the differences in PSNR versus bit-rate between ParaPM and SeriPM are shown in Figure 81.6, when integer mode and various block sizes are applied. The results show that the bigger the block size, the better is the performance for ParaPM and SeriPM.

VLSI Architectures for JPEG 2000 EBCOT-Design Techniques and Challenges-0073

Also, the performance difference between them has become smaller with increase of block size. In 16 X 16 code block size, the performance gap between them is only about 1.0 dB and for 64 X 64, the performance gap is only 0.2 dB. So the performance of ParaPM is only slightly poorer than that of SeriPM.

Comments

Popular posts from this blog

SRAM:Decoder and Word-Line Decoding Circuit [10–13].

ASIC and Custom IC Cell Information Representation:GDS2

Timing Description Languages:SDF