Computer Arithmetic or VLSI Signal Processing:Fixed-Point Arithmetic Elements

Fixed-Point Arithmetic Elements
Adders

Addition is performed by summing the corresponding bits of two numbers, including the sign bits. Subtraction is performed by summing the corresponding bits of the minuend and the two’s complement of the subtrahend. Overflow is detected in a two’s complement adder by comparing the carry signals into and out of the most significant adder stage (i.e., the stage which computes the sign bit). If the carries differ, an overflow has occurred and the result is invalid. Alternatively, if adding two numbers of like sign, if the sign of the result is different, then overflow has occurred. If adding numbers of unlike sign (or subtracting numbers of like sign) overflow cannot occur.

Full adder. The full adder is the fundamental building block of most arithmetic circuits. Its operation is defined by the truth table shown in Table 80.2.

The sum and the carry outputs are described by the following equations:

Computer Arithmetic for VLSI Signal Processing-0034

where Å denotes the exclusive-OR logic operation, ak, bk, and ck are the inputs to the kth full adder stage, and sk and ck+1 the sum and carry outputs, respectively.

Computer Arithmetic for VLSI Signal Processing-0035

In evaluating the relative complexity of implementations, it is useful to assume a nine gate realization of the full adder, as shown in Figure 80.2. In this implementation, the exclusive-OR logic operation is performed with two AND gates, an OR gate, and an inverter. For this full adder, the delay from either ak or bk to sk is six gate delays, the delay from either ak or bk to ck+1 is five gate delays, the delay from ck to ck+1 is two gate delays and the delay from ck to sk is three gate delays. In some technologies, such as CMOS, inverting gates (e.g., NAND and NOR gates) are more efficient than the noninverting gates that are used here. Circuits with equivalent speed and complexity can be constructed with inverting gates.

Ripple carry adder. A ripple carry adder for n-bit numbers is constructed by concatenating n full adders as shown in Figure 80.3. At the k th-bit position, the k th bits of operands A and B and a carry signal from the (k - 1)th stage are used to form the k th bit of the sum, sk , and the carry, ck+1, to the next adder stage. This is called a ripple carry adder, since the carry signals “ripple” from the lsb position to the most significant bit position. If the ripple carry adder is implemented by concatenating n of the nine gate full adders, as shown in Figure 80.2 and Figure 80.3, an n-bit ripple carry adder requires 2n + 4 gate delays to produce the most significant sum bit (5 gate delays to form c1 in the least significant full adder, 2 gate delays to form ci+1 from ci in each of the n - 2 intermediate full adders and 3 gate delays to form sn-1 from cn-1 in the most significant full adder) and 2n + 3 gate delays to produce the most significant carry output. A total of 9n logic gates are required to implement the n-bit ripple carry adder.

In comparing adders, the delay from data input to the most significant sum output and the complexity (i.e., the gate count) will be used. These will be denoted by DELAY and GATES (subscripted by RCA to indicate ripple carry adder), respectively. In Eq. (80.5) Dgate indicates the delay of a logic gate. Similarly,

Computer Arithmetic for VLSI Signal Processing-0036

in Eq. (80.6), bFA indicates the complexity of a full adder. These simple metrics that ignore the effects of varying fan-in and fan-out are suitable for first-order comparisons. More accurate comparisons require consideration of the types of gates (since, for example, gates with fewer inputs are generally faster and smaller than those with more inputs).

Computer Arithmetic for VLSI Signal Processing-0037

Carry lookahead adder. Another approach is the carry lookahead adder first described by Weinberger and Smith [1]. Here specialized carry logic computes the carries in parallel. The carry lookahead adder uses modified full adders for each bit position and lookahead modules. The modified full adders shown in Figure 80.4 are modified from the full adder of Figure 80.2 in that propagate and generate signals are made available as outputs and no carry output is formed. The propagate and generate signals indicate that an incoming carry would propagate across the adder or that a carry is generated within the adder, respectively. Each lookahead module forms individual carry outputs and also block propagate and block generate outputs that indicate that an incoming carry would propagate across the data block or that a carry is generated within the data block, respectively.

Rewriting Eq. (80.5) using propagate and generate variables, pk = ak + bk and gk = ak bk, respectively:

Computer Arithmetic for VLSI Signal Processing-0038This equation explains the concept of carry generation and carry propagation. At a given stage, a carry is generated irrespective of the carry into the stage if gk is true (i.e., both ak and bk are 1), and a stage propagates a carry from its input to its output if pk is true (i.e., either ak or bk is a 1). Actually, pk = ak Åbk, but as noted in Ref. [2] the alternative pk = ak + bk gives correct results in Eq. (80.7), is easier to realize, and provides redundancy in the calculation of ck+1.

Extending Eq. (80.7) to a second stage,

Computer Arithmetic for VLSI Signal Processing-0039

Eq. (80.8) results from evaluating Eq. (80.7) for the (k + 1)th stage and substituting the value for ck+1 from Eq. (80.7). Carry ck+2 exits from stage k + 1 if: a carry is generated there; or if a carry is generated in stage k and propagates across stage k + 1; or if a carry enters stage k and propagates across both stages k and k + 1.

Extending Eq. (80.7) to a third stage,

Computer Arithmetic for VLSI Signal Processing-0040Although it would be possible to continue this process indefinitely, each additional stage increases the fan-in (i.e., the number of inputs) of the logic gates. Four inputs (as required to implement Eq. (80.9)) is frequently the maximum number of inputs per gate for current technologies. To continue the process, generate and propagate signals are defined over 4-bit blocks (stages k to k+3), gk+3:k and pk+3:k , respectively.

Computer Arithmetic for VLSI Signal Processing-0041Thus, the carry out from a 4-bit-wide block can be computed in only four gate delays (the first to compute pi and gi for i = k, k + 1, k + 2, and k + 3, the second to evaluate pk+3:k using Eq. (80.11), the second and third to evaluate gk+3:k using Eq. (80.10), and the third and fourth to evaluate ck+4 using Eq. (80.12)).clip_image020An n-bit carry lookahead adder requires é(n - 1)/(r - 1)ù lookahead blocks, where r is the “width” of the block and where éXù indicates the smallest integer greater than or equal to X. A 4-bit lookahead block is a direct implementation of Eqs. (7)–(11) requiring 14 logic gates. In general, an r-bit lookahead block requires 1 (3r + r 2 ) logic gates.

Figure 80.5 shows the interconnection of 16 modified full adders and 5 lookahead logic blocks to realize a 16-bit carry lookahead adder. The sequence of events which occurs during an add operation is as follows: (1) apply A, B, and the least significant carry in, c0; (2) the 16 modified full adders computes pi and gi; (3) the first-level lookahead logic blocks compute the 4-bit propagate and generate signals; (4) the second-level lookahead logic block computes c4, c8, and c12; (5) the first-level lookahead logic blocks compute the individual carries, c1-c3, c5-c7, c9-c11 and c13-c15; and (6) each modified full adder computes the sum outputs. This process may be extended to larger adders by subdividing the large adder into 16- bit blocks and using additional levels of carry lookahead (e.g., a 64-bit adder requires three levels).

The delay of carry lookahead adders is evaluated by recognizing that an adder with a single level of carry lookahead (for r-bit words) realized with 8 gate modified full adders as shown in Figure 80.4 has six gate delays and that each additional level of lookahead increases the maximum word size by a factor of r and adds four gate delays. More generally [3, pp. 83–88], the number of lookahead levels for an n-bit adder is élogr (n)ù, where r is the number of bits per lookahead module. Since an r-bit carry lookahead adder has six gate delays and there are four additional gate delays per carry lookahead level after the first,

Computer Arithmetic for VLSI Signal Processing-0042

The complexity of an n-bit carry lookahead adder implemented with r-bit lookahead blocks is n modified full adders (each of which has a complexity of bMFA and é(n - 1)/(r - 1)ù lookahead logic blocks (each of which requires 1 (3r + r 2 ) logic gates with fan-in ranging from 2 to r:

Computer Arithmetic for VLSI Signal Processing-0043

The carry lookahead approach reduces the delay of adders from increasing linearly with the word size (as for ripple carry adders) to increasing as the logarithm of the word size. As with ripple carry adders, the carry lookahead adder complexity grows linearly with the word size. For r = 4, bMFA = 8, and bFA = 9, the gate count of a carry lookahead adder is 40% greater than that of a ripple carry adder.

Carry select adder. Another approach is the carry select adder first described by Bedrij [4]. The carry select adder divides the words to be added into blocks and forms two sums for each block in parallel (one with a carry in of ZERO and the other with a carry in of ONE). As shown for a 16-bit carry select adder on Figure 80.6, the carry out from the previous block controls a multiplexer that selects the appropriate sum. The carry out is computed using Eq. (80.7), since the block propagate signal is the carry out of an adder with a carry input of ONE and the block generate signal is the carry out of an adder with a carry input of ZERO.

If a constant block width of k is used and if ripple carry adders constructed from the nine gate full adder of Figure 80.2 are used, there will be n/k blocks and the delay to generate the sum is 2k + 3 gate

delays to form the carry out of the first block, 2 gate delays for each of the

Computer Arithmetic for VLSI Signal Processing-0044

Computer Arithmetic for VLSI Signal Processing-0045

Slightly better results can be obtained by varying the width of the blocks. In this case the optimum is to make the two least significant blocks of the same size and make each succeeding block one bit larger. For this configuration, the delay for each block’s most significant sum bit will equal the delay to the multiplexer control signal for that block [5, p. A-38].

The complexity of the carry select adder is 2n - k ripple carry adder stages, the intermediate carry

Computer Arithmetic for VLSI Signal Processing-0046

This is somewhat more than twice the complexity of a ripple carry adder.

Comments

Popular posts from this blog

SRAM:Decoder and Word-Line Decoding Circuit [10–13].

ASIC and Custom IC Cell Information Representation:GDS2

Timing Description Languages:SDF