VLSI Architectures for Forward Error-Control Decoders:Viterbi Decoder Architectures
Viterbi Decoder Architectures
The Viterbi decoding algorithm determines the ML transmitted information sequence u. Recall that the receiver knows the encoder trellis and the noisy observations y. The Viterbi decoder consists of three types of computations: (1) branch metric computation implemented in a branch metric unit (BMU), (2) state/path metric computation implemented in a path metric unit (PMU), and (3) survivor path computation implemented in the survivor memory unit (SMU). Thus, a generic Viterbi decoder architecture will take the form shown in Figure 82.8.
We now describe each of the computations in more detail.
(1) Branch Metric Computation. We compute the branch metric Bi,j(k) for the branch connecting states si(k) and sj(k + 1) and do so for every branch in the trellis. The branch metric is a measure of the probability of traversing the branch bi,j(k) given that the encoder is in state si(k) and the received value is y(k).
If the demodulator provides hard decisions such that y(k) is binary, then the branch metric is the Hamming distance between y(k) and the code symbol c(k) associated with the branch. If y(k) is a real number and the noise in the channel is AWGN, then the ML principle described in Section 82.3.2 results in the branch metric computation to be identical to the computation of the Euclidean distance given by
where l indexes the individual components of y(k) and c(k), i.e., the samples and bits of y(k) and c(k), respectively. The BMU has a feed-forward architecture and hence can be pipelined easily and therefore is usually not in the critical path.
(2) State/Path Metric Computation. We compute the state/path metric Si(k) for each state in the trellis. The state metric Si(k) is a measure of the probability of the encoder arriving at state si(k) given the observations {y(0), y(1), …, y(k - 1)}. In computing Si(k), the Viterbi algorithm employs the branch metrics. Hence, it has knowledge of the most likely path that leads to the state si(k). This path is called the survivor path and hence the state metric is also called the survivor path metric or simply the path metric. The survivor path needs to be stored in memory.
To understand the state metric computation, consider the two-state trellis shown in Figure 82.9 where the state metrics S0(k) and S1(k) corresponding to states s0(k) and s1(k), respectively, are employed as labels for the trellis nodes. The branches corresponding to u(k) = 0 are dotted and the solid branches correspond to u(k) = 1. The trellis branches are labeled with the corresponding branch
where S0(k) = min[S0(k - 1) + a, S1(k - 1) + c] and S1(k) = min[S0(k - 1) + b, S1(k - 1) + d]. From Eq. (82.19), it is clear that the basic operation in determining the state metrics involves addition of the branch metric to the previous state metric followed by a comparison with other such products, and finally followed by a selection of the minimum of all possible path metrics. Therefore, the state metrics are computed as
where the min is over all 2ko
states at time index k that are connected to state sj(k + 1) and ds (k) referred to as the decision, is the value held by the earliest bit in the encoder shift-register. Note that the availability of the decision (earliest) bit makes it possible to determine the state at time index k - 1 from which the survivor path was extended to state sj(k). This information is useful in tracing back the survivor path to generate the decoded bits. For each state at time k, indexing the 2ko incoming branches with the earliest encoder bits at time index k - 1 enables the decision bit to also indicate the branch of the survivor path. Eq. (82.20) is referred to as the add-compare-select (ACS) operation and is implemented in an ACS unit (ACSU) that forms a key processing kernel in the PMU. The PMU is usually the throughput determining block of a Viterbi decoder and hence we will discuss the design of this block in somewhat greater detail. The PMU can be designed in a state-parallel implementation where all the N states metrics are computed in parallel using N ACSUs, or serially or a hybrid. A direct implementation of an ACSU is shown in Figure 82.10(a) which includes a 2 -input CS block. In this figure, Sˆi,j(k) for 0 £ i < 2k-o are the state metrics of the states connected to state j. The 2ko-input CS block can be implemented as a tree of two-input CS blocks as shown in Figure 82.10(b) (for k0 = 2). The two input CS block compares the MSB of the two inputs and proceeds down to the LSB. If any two input bits differ, then the CS block flags a decision and selects the smaller of its inputs as the output. The critical path of the ACSU consists of a carry ripple in the adder (LSB to MSB) followed by a compare ripple (MSB to LSB). The critical path of the ACSU can be reduced by using redundant carry-save MSB first computation [35].
Two algorithmic techniques for increasing throughput are parallel processing and higher radix processing [36]. In parallel processing, the input sequence is divided into subblocks and more than one subblock is processed at a time. The other option for increasing throughput is to process more than one trellis section at a time. This process is termed higher radix processing. Figure 82.11, shows the concept of merging two trellis sections of the encoder shown in Figure 82.7(a). This leads to radix-4 processing.
The output code symbols on the branches going to the zeroth state are shown in the radix-4 trellis in Figure 82.11. It is clear that the complexity of BMU doubles in going from conventional radix-2 to radix-4 processing; twice as many branches now exist and each branch metric computation requires the processing of twice the number of inputs. Further, the number of comparisons in the PMU also doubles. However, the additions are performed outside the ACS loop leading to a net speed up of the PMU which in turn leads to a throughput increase compared to radix-2 processing. The achievable throughput increase has been observed to lie between 1 and 2 for radix-4 processing.
(3) Survivor path computation. Each path metric update in the PMU results in the extension of the survivor paths to each of the N states sj(k) from one of the N states si(k - 1). These survivor paths need to be stored and processed to determine the decoder output. Typically, as a rule of thumb, the survivor path length or the survivor memory depth L is chosen to be four to five times the constraint length K. Doing so results in the survivor paths of all the states converging to a single path for time index smaller than k - L with a probability approaching one. This unique path is the maximum likelihood path and the input labels or the decisions dsi(k - L - l)(l ³ 1) (see Eq. (82.20)) are the decoded outputs of the Viterbi decoder.
There are two main approaches to implementing the SMU: (1) trace-back [37] and (2) register-transfer [38]. Figure 82.12(a) shows a 4-state trellis with the branch labels indicating the decisions and the states being marked as shown. In this figure, the MSB of the state-index represents the earliest encoder bit and the LSB represents the most recent encoder bit. It is seen that the branch labels correspond to the MSB of the state from which the branch originates.
The survivor path memory in Figure 82.12(b) is organized in an N row-by L column format where each row corresponds to a state and each column corresponds to a time index in the range k to k - L. Thus, in each time index, the vector of decisions from Eq. (82.20) are stored in the corresponding column. In trace-back, the survivor path of any state at time k is chosen to be traced back. The trace-back procedure involves recursively reading the contents of the survivor path memory shown in Figure 82.12(b) from time index k to k - L. The address, which is essentially the row index of the memory, for time index
k - 1 is generated by right-shifting the address at time index k and appending the contents of memory location read at time index k. Note that the address sequence generated by this procedure is identical to the time-reversed state sequence generated by the forward recursion of the ACS unit.
Figure 82.12(c) shows the architecture of a register-transfer architecture of the SMU. In this approach, the survivor paths and not the vector of decisions, are stored in an L bit register (path register) for every state. At each time index k, the survivor path gets updated based on the decision made by the ACSU. The most recent bit in the register for a state si(k) is always equal to ds (k). However, the other bits in the path register may need to be replaced entirely with the contents of the path register of another state. This occurs if the survivor path for state si(k) is obtained by extending the survivor path from another state sj(k - 1). Register transfer is fast, but power hungry owing to the extensive data movement needed to update the survivor path register.
Hybrid approaches for designing the SMU [39] exist where trace-back and register-exchange approaches are combined. In general, the trace-back approach is suitable for decoders with a large number of states (N ³ 8) and low data rates (<2 Mb/s) while the register-transfer works well when the number of states is small and data rates are high (>10 Mb/s). Though most applications in the past fell neatly into these two categories, many modern-day applications such as ultra-wideband [40] have data rates in the 100’s of Mb/s along with a large number of states such as N = 64. Thus, the design of high-speed, low- power, large-state Viterbi decoders continues to be an important area of research [41,42].
Comments
Post a Comment