Computer Arithmetic or VLSI Signal Processing:Multipliers
Multipliers
The bit product matrix of an n ´ n (in this case 5 ´ 5) multiplier for unsigned operands is shown in Figure 80.7. The two operands, A and B, are shown at the top, then there are n rows (each consisting of n bit products) that comprise the bit product matrix. Finally, the product (2n bits wide) is at the bottom.
There are several ways to implement a multiplier. One of the oldest methods is to use an n bit wide adder to sum the rows of the bit product matrix in a row by row fashion. This can be quite slow as n - 1 cycles (each long enough to complete an n bit addition) are required. If a ripple carry adder is used, the time to multiply two n bit numbers is proportional to n2. If a fast adder such as a carry lookahead adder is used, the time is proportional to n log2(n).
Booth multiplier. The Booth multiplier [6] and the modified Booth multiplier are attractive for two’s complement multiplication, since they accept two’s complement operands, produce a two’s complement product and are easy to implement. The sequential Booth multiplier requires n cycles to form the product of a pair of n bit numbers, where each cycle consists of an n-bit addition and a shift, an n-bit subtraction and a shift, or a shift without any other arithmetic operation. The radix-4 modified Booth multiplier [2] takes half the number of cycles as the “standard” Booth multiplier although the operations performed during a cycle are slightly more complex (since it is necessary to select one of five possible addends, namely, ±2B, ±B, or 0 instead of one of three). Extensions to higher radices that examine more than three bits are possible, but generally not attractive because the addition/subtraction operations involve nonpower of two multiples (such as 3B and 5B) which raises the complexity.
Array multiplier. An alternative approach to multiplication involves the combinational generation of all bit products and their summation with an array of adders. To accommodate two’s complement operands, the bit product matrix of Figure 80.7 is modified as shown by Figure 80.8. The modifications include inverting the bit products that comprise the most significant column and the most significant row of the matrix (but not the single bit product at the most significant column and row of the matrix) and adding two ONES to the matrix one in column (n + 1) and one in column 2n [7, p. 179].
As shown in Figure 80.8, a product that is 2n bits wide is produced with two integer bits. This growth in word size is necessary to accommodate the case of –12 (an artifact of the nonsymmetric nature of the two’s complement number system) that requires an extra integer column for its representation since +1 cannot be represented in the “normal” two’s complement fractional number system. In most digital signal-processing applications, an n bit two’s complement result is desired. This can be accomplished by removing the (n - 1)th product bits and the 2nth product bit. The n - 1 bits can be removed by either truncation (where the n - 1 least significant columns are not formed) or by rounding (where the n – 1 least significant columns are formed, a ONE is added in column (n – 1) and after the product is formed, the n – 1 least significant bits are removed). In either case, column 2n is not formed, so the case of attempting to compute –12 must be prevented at the system level.
Figure 80.9 shows an implementation of the 5 bit ´ 5 bit multiplier for two’s complement operands of Figure 80.8. It uses a 5 ´ 5 array of cells to form the bit products and four adders (at the bottom of the array) to complete the evaluation of the product. Only a few types of cells are used in this implementation: gate cells (marked AND or NAND) which use a single gate to form the logic AND or NAND of the a and b inputs to the cell, half adder cells (marked HA) which sum the second input to the cell with the logic AND of the a and b inputs to the cell and full adder cells (marked FA) which sum the second and third inputs to the cell with the logic AND (or NAND for the a4 row next to the bottom of the array) of the a and b inputs to the cell. Standard full adders are used in the bottom row of the multiplier. There is a special modified half adder, marked MHA in the bottom row. It forms the carry and sum corresponding to one plus the carry and sum of the two operands that are input to the MHA. Its position at the right end of the bottom row adds the one shown at the top of the p5 column of Figure 80.8. The one shown in the p9 column of Figure 80.8 is not implemented (it is assumed that either a five-bit single-precision product or a nine-bit double- precision product is desired). One of the adders in the next to the top row is a full adder, if the extra input to this adder is a one, a rounded single precision product is available at outputs p8 . p7, p6, p5, and p4.
This implementation of an n ´ n array multiplier requires 2n - 1 gate cells, n - 2 half adders, 1 modified half adder, and (n – 1)2 full adders. Of the half and full adder cells (n – 1)2 have an extra AND or NAND gate. Thus the total complexity is
The delay of the array multiplier is evaluated by following the pathways from the inputs to the outputs. The longest path starts at the upper left corner, progresses to the lower right corner, and then across the bottom to the lower left corner. If it is assumed that the delay from any adder input (for either half or full adders) to any adder output is Dadder, then the total delay of an n-bit ´ n-bit array multiplier is
Thus, the modified array multiplier is a bit more complex and also faster than the original array multiplier. For a 16-bit ´ 16-bit multiplier the complexity increases by a few percent while the delay drops by 1 or more (depending on the delay of an adder).
Array multipliers are easily laid out in a cellular fashion, making them quite attractive for VLSI implementation, where minimizing the design effort may be more important than maximizing the speed.
Dadda fast multiplier. A method for fast multiplication was developed by Wallace [8] and refined by Dadda [9]. With this method, a three-step process is used to multiply two numbers: (1) the bit products are formed, (2) the bit product matrix is “reduced” to an equivalent two row matrix, where the sum of the two rows equals the sum of the rows in the initial bit product matrix, and (3) the two numbers are summed with a fast adder to produce the product. Although this may seem to be a complex process, it yields multipliers with delay proportional to the logarithm of the operand word size which is “faster” than the Booth multiplier, the modified Booth multiplier, or array multipliers all of which have delays proportional to the word size.
The fast multiplication process is shown for a 5-bit ´ 5-bit two’s complement Dadda multiplier in Figure 80.11. An input matrix of dots (each dot represents a bit product, each dot with an overbar represents the logical complement of the corresponding bit product, the 1 indicates a fixed logic ONE and R indicates a bit that is set to logic ONE when a rounded 5-bit product is desired and is set to logic ZERO when a double-precision 9-bit product is desired) is shown as the initial bit product matrix.
The height of the matrices in the reduction process is determined by working back from the final (two row) matrix and limiting the height of each matrix to the largest integer that is no more than 1.5 times the height of its successor. The first several terms of the sequence are 2, 3, 4, 6, 9, 13, 19, 28, 42, 63, 94, … .
Since the height of the initial bit product matrix in Figure 80.11 is 5, columns having more than four dots (or that will grow to more than four dots owing to carries) are reduced by the use of half adders (each half adder takes in two dots from a column of the bit product matrix and outputs one
dot in the same column of the next matrix and one dot in the next more significant column of the next matrix), a modified half adder and full adders (each full adder takes in three dots from a column of the bit product matrix and outputs one dot in the same column of the next matrix and one dot in the next more significant column of the next matrix) so that no column in the next matrix will have more than four dots.
The outputs of half adders are shown by a “crossed” line in the succeeding matrix, the outputs of the modified half adder are shown by a double crossed line in the succeeding matrix and the outputs of full adders are shown by an uncrossed line in the succeeding matrix. In each case the rightmost dot of the pair that is connected by a line is in the column from which the inputs were taken in the preceding matrix for the adder. In the succeeding steps reduction to Matrix 2 with no more than three dots per column, and finally Matrix 3 with no more than two dots per column is performed.
Each matrix is produced from its predecessor in one adder delay. Since the number of matrices is logarithmically related to the number of rows in the initial bit product matrix which is equal to the number of bits in the words to be multiplied, the delay of the matrix reduction process is proportional to log n. Since the carry propagate adder that reduces the final two row matrix can be implemented as a carry lookahead adder (which also has delay that is proportional to the logarithm of the word size), the total delay for this Dadda multiplier is proportional to the logarithm of the word size.
The exact delay of the Dadda multiplier is evaluated by following the pathways from the inputs to the outputs. The longest path starts at the center column, progresses through the successive reduction matrices (there are approximately log1.44(n) - 2 stages), and finally through the 2n - 2 bit adder. If a carry lookahead adder realized with 4-bit lookahead blocks is used for the carry propagate adder (with delay given by Eq. (80.13)), the total delay of an n-bit by n-bit Dadda multiplier is
The Dadda multiplier complexity is determined by evaluating the complexity of its parts, n2 gates (2n – 2 are NAND gates, the rest are AND gates) to form the bit product matrix, (n - 2)2 full adders, a few (slightly less than n) half adders and one modified half adder for the matrix reduction, and a 2n – 2 bit carry propagate adder for the addition of the final two-row matrix.
If a carry lookahead adder (implemented with 4-bit lookahead blocks) is used for the carry propagate adder and if a full adder is comprised of 9 gates and a half adder (either regular or modified) is comprised of 4 gates, then Eq. (80.27) reduces to
Figure 80.12 shows a 16-bit ´ 16-bit two’s complement Dadda multiplier. The reduction becomes concentrated in the center columns in the first stage, but it broadens and becomes more uniform as it progresses through the subsequent stages.
Wallace fast multiplier. The Wallace multiplier [8] is similar to the Dadda multiplier described above, except that a different approach is used to reduce the bit product matrix. In Dadda’s approach the minimum reduction is done to each matrix to cause the succeeding matrix height to conform to the sequence of matrix heights. In the Wallace approach, the maximum reduction is done at each stage. This is illustrated by a 16-bit by 16-bit two’s complement Wallace multiplier shown in Figure 80.13.
It is clear from Figure 80.13 that the Wallace approach does most of its reduction in the first few stages. For example, 113 of the 201 full adders are used in the first two stages. For a given word size the width of the carry propagate adder of the Wallace multiplier is smaller than that of the Dadda multiplier (if implemented with a carry lookahead adder, for some word sizes the final adder for the Dadda multiplier will require an extra level of lookahead logic). The number of reduction stages is the same for Wallace and Dadda multipliers, although the height of the matrices is not the same at all stages. For most sizes the delay is the same for Wallace and Dadda multipliers (for some word sizes the Dadda multiplier will require an extra level of lookahead logic for its carry propagate adder adding an extra four gate delays). The Wallace multiplier uses more full adders than the Dadda multiplier does, but its carry propagate adder is smaller by the same amount, so that the combined adder complexity is the same. Although the Wallace multiplier does use quite a few more half adders, its complexity is only slightly greater than that of Dadda multipliers.
Comments
Post a Comment