VLSI Architectures for Forward Error-Control Decoders:High-Speed MAP Decoder Architectures

High-Speed MAP Decoder Architectures

In this section, we present techniques for improving the throughput of recursive data-flow graph present in the MAP decoder, in particular in the ACS kernel. First, we review existing techniques of parallel processing [51,52] and look-ahead transform [36,53]. Then, we present the block-interleaved pipelining (BIP) technique [54–56].

In general, pipelining or parallel processing becomes difficult for a recursive data-path. However, if the data is being processed in blocks and the processing satisfies the following two properties: (1) computation

VLSI Architectures for Forward Error-Control Decoders-0123

between blocks are independent and (2) computation within a block is recursive, then, a block parallel processing architecture can be achieved. Further, if a block can be segmented into computationally independent subblocks, parallel processing can be applied at the subblock level. This leads to the high- throughput MAP decoder architectures presented in Refs. [51,52,54–56].

(1) Parallel Processing. Consider the recursive architecture in Figure 82.16(a). Note that the architecture in this figure cannot be easily pipelined or parallelized owing to the presence of the feedback loop. However, if a data block of length B is processed independently of other blocks and the computations within a block can be segmented into computationally independent subblocks, then one can parallelize the architecture as shown in Figure 82.16(b), where the parallelization factor M = 2 and a block X is divided into M = 2 subblocks, X1 and X2. It is obvious that the critical path is not affected and the throughput is increased by a factor of M at the expense of a factor of M increase in hardware complexity.

(2) Look-ahead transform. Another transform to achieve high-throughput for recursive data-flow graphs is look-ahead computation [53]. Look-ahead leads to an increase in the number of symbols processed at each time step as shown in Figure 82.17, where two symbols are processed in one clock cycle. If x(n) = [x(2n - 1), x(2n)] and y(n) = [y(2n - 1), y(2n)], then look-ahead results in the output being expressed as y(n) = F(x(n), y(2(n - 1)). Note that F(×) will have a longer critical path delay than the original computation of Figure 82.16(a). However, it has been shown that the function F(×) can be optimized via logic minimization so that an overall increase in throughput can be achieved. For example, as mentioned earlier, in the context of Viterbi decoding, it has been shown that a 1.7X increase in throughput is feasible via radix-4 computation [36].

(3) BIP. The parallel architecture in Figure 82.16(b), where the level of parallelism is equal to 2, has two identical computation units processing two independent input symbols. Therefore, the hardware complexity increases linearly with the level of parallelism M. A comparatively area-efficient architecture is obtained by using the BIP technique proposed in Ref. [58]. First, the data-flow of Figure 82.16(b) is folded [53] on to a single computation unit as shown in Figure 82.18(a), where two independent computations are carried out in a single computational unit. Note that the resulting BIP architecture in this figure is inherently pipelined. Therefore, an application of retiming [53] (see Figure 82.18(b)) results in reduction of the critical path delay by a factor of two over that of the original architecture in Figure 82.16(a). It is clear that the retimed BIP architecture in Figure 82.18(b) leads to high throughput at the cost of a marginal increase in memory owing to pipelining latches when compared to the architecture in Figure 82.16(a).

VLSI Architectures for Forward Error-Control Decoders-0124

Comments

Popular posts from this blog

SRAM:Decoder and Word-Line Decoding Circuit [10–13].

ASIC and Custom IC Cell Information Representation:GDS2

Timing Description Languages:SDF