Multipliers:Sequential Multiplier

Introduction

Many microprocessors and digital signal processors now have fast multipliers in them. There are several types of multipliers with different speeds, areas, and configurations. For the details of multipliers and multiplication methods, see Refs. 3–5, 7–10, and 14.

Here, let us consider multiplication of n-bit positive binary integers (without the sign bit) where a multiplicand X = [xn –1xn –2 … x0] is multiplied by a multiplier Y = [yn –1yn –2 … y0] to derive the product P = [p2n –1p2n–2 … p0], where each of xi , yi , and pi takes a value of 0 or 1.

Sequential Multiplier

A sequential multiplier works in a manner similar to manual multiplication of two decimal numbers, although two binary numbers are multiplied in this case. A multiplicand X = [xn–1xn–2 …x0] is multiplied by each bit of a multiplier Y = [yn–1yn–2 …y0], forming the multiplicand-multiple Z = [zn–1zn–2 …z0], where zi = xiyj for each i = 0, …, n – 1. Then, Z is shifted left by j bit positions and is added, in all digit positions in parallel, to the partial product Pj–1 which has been formed by the previous steps, to generate the partial product Pj . Repeating this step for j = 0 to n – 1, the product P = [p2n–1 p2n–2 …p0] of 2n bits is derived. The only difference of this sequential multiplier from the manual multiplication is the repeated addition of each multiplicand-multiple, instead of one-time addition of all multiplicand-multiples at the end.

This sequential multiplier is realized, as shown in Figure 42.1, which consists of a Multiplicand Register of n-bits for storing multiplicand X, a Shift Register of 2n-bits for storing multiplier Y and partial product Pj –1, a Multiplicand-Multiple Generator (denoted as MM Generator) for generating a multiplicand- multiple yj · X, and a Parallel Adder of n-bits. Initially, X is stored in the Multiplicand Register and Y is stored in the lower half (i.e., the least significant bit positions) of the Shift Register where the upper half of the Shift Register stores 0. This sequential multiplier performs one iteration step described above in each clock cycle. In other words, in each clock cycle, a multiplier bit yj is read from the right-most position of Shift Register. A multiplicand-multiple yj · X is produced by Multiplicand-Multiple Generator, which is X or 0 based on whether yj is 1 or 0, and is fed to Parallel Adder. The upper n-bit of the partial product is read from the upper half of Shift Register and also fed to Parallel Adder. The content of Shift Register

Multipliers-0495

is shifted one position to the right. The (n + 1)-bit output of Parallel Adder including the carry output, which is the upper part of the updated partial product, is stored into the upper (n + 1) positions of Shift Register. After n cycles, Shift Register holds the 2n-bit product, P.

We can use any adder described in Chapter 38. The faster the adder, the shorter the clock cycle and hence the faster the multiplier. When we use a carry save adder as Parallel Adder, we have to modify Shift Register so that its upper half stores two binary numbers (i.e., a partial sum and a partial carry). Besides the carry save adder, we need a carry propagate adder for summing up the final partial sum and carry. We can accelerate sequential multiplication by processing multiple-bits of the multiplier per clock cycle. When k-bits of multiplier Y are processed per cycle, an n-bit multiplication is performed through n/k clock cycles. There are two methods for processing multiple-bits of the multiplier per cycle. One is generating several candidate multiplicand-multiples and then choosing an appropriate one among them.

The other is generating k multiplicand-multiples and summing them up in one cycle. Also, these two methods can be combined.

In the first method, Multiplicand-Multiple Generator generates 2k different multiplicand-multiples, 0, X, 2X, 3X, …, (2k – 1)X. For example, when k = 2, Multiplicand-Multiple Generator generates 2X and 3X, as well as X and 0. Multiplicand-Multiple Generator consists of a look-up table containing the necessary multiples of the multiplicand and a selector for selecting an appropriate multiple. The look-up table need not hold all multiples, because several of them can be generated from others by shifting whenever necessary. For example, when k = 2, only 3X must be pre-computed and be held. Extended Booth’s method [13] is useful for reducing the number of pre-computed multiples. When k-bits are processed per cycle, k-bit Booth’s method is applied, which recodes multiplier Y into radix-2k redundant signed-digit representation with the digit set {–2k–1, –2k–1 + 1, …, 0, …, 2k–1 – 1, 2k–1}, where each digit in radix 2k takes a value among those in this digit set. Y is recoded into Yˆ by considering k + 1 bits of Y, i.e., ykj+k–1, ykj+k–2, …, ykj+1, ykj, ykj –1, ., k + 1 bits among yn–1, yn–2, …, y0 of Y ) per cycle, instead of only a single bit of Y (say, yj ) per cycle, as illustrated in Figure 42.2(a). More specifically, the j-th digit yˆj of the recoded multiplier, where

Multipliers-0496

multiplier Y = [yn–1yn–2 … y0] which has n components. Then we have a multiplicand-multiple yˆj · X.

Since the negation of a multiple can be produced by complementing it bitwise and adding 1 at the least significant position (this 1 is treated as a carry into the least significant position of Parallel Adder), the number of multiples to be held is reduced. For example, when k = 2, the multiplier is recoded to radix- 4 redundant signed-digit representation with the digit set {–2, –1, 0, 1, 2} by means of the 2-bit Booth’s method (i.e., the extended Booth’s method with k = 2) as yˆj = –2y2j+1 + y2j + y2j –1 and all multiples are produced from X by shift and/or complementation. In 2-bit Booth recoding, multiplier Y is partitioned into 2-bit blocks, and then at the j-th block, the recoded multiplier digit yˆj is calculated from the two bits of the block, i.e., y2j+1 and y2j, and the higher bit of the next lower block, i.e., y2j –1, according to the rule shown in Table 42.1. For example, 11110001010 is recoded to 2 0 2 1 1 2, where 1 and 2 denote –1 and –2, respectively, as illustrated in Figure 42.2(b). (In this case, whenever a bit is not available in Y, such as the next lower bit of the least significant bit, it is regarded as 0.) When the extended Booth’s method is employed, Parallel Adder is modified such that negative numbers can be processed.

In the second method for processing multiple-bits of the multiplier per clock cycle, Parallel Adder sums up k + 1 numbers, using k adders. Any adder can be used, but usually carry save adders are used for the sake of speed and cost. Carry save adders can be connected either in series or in tree form. Of course, the latter is faster, but the structure of Parallel Adder becomes more complicated because of somewhat irregular wire connections. By letting k be n, we have a parallel multiplier, processing the whole multiplier-bits in one clock cycle, as will be mentioned in the following section.

Multipliers-0497

Comments

Popular posts from this blog

Square wave oscillators and Op-amp square wave oscillator.

Timing Description Languages:SDF

Adders:Carry Look-Ahead Adder.