Computer Arithmetic or VLSI Signal Processing:Fixed-Point Number Systems

Introduction

Until the 1970s nearly all signal processing was done with analog circuits. Now it is mostly digital. The change is the result of two technology developments: mixed-signal (analog[A] and digital[D]) circuits (mostly, A/D and D/A converters) and VLSI arithmetic elements (mostly, adders and multipliers).

This chapter examines the arithmetic elements at the algorithm and logic design levels. Different algorithms may be used to vary the speed of an arithmetic element by an order of magnitude or more while the complexity varies by <50%.

This chapter examines number systems for signal processing in Section 80.2. The implementation of fixed-point arithmetic elements are examined in Section 80.3. Finally, Section 80.4 briefly considers the implementation of floating-point arithmetic elements.

Regarding notation, capital letters represent digital numbers (i.e., words), whereas subscripted lowercase letters represent bits of the corresponding word. The subscripts range from n - 1 to 0 to indicate the bit position within the word (xn-1 is the most significant bit of X, x0 is the least significant bit of X, etc.). The logic designs in this chapter are based on positive logic with AND, OR, and invert operations. Depending on the technology used for implementation, different operations (such as NAND and NOR) may be used, but the basic concepts do not change.

Fixed-Point Number Systems

In digital signal processing, most arithmetic is performed with fixed-point binary numbers that have constant scaling (i.e., the position of the binary point is fixed). The numbers can be interpreted as fractions, integers, or mixed numbers, but fractions are the most commonly used.

Pairs of fixed-point numbers are used to create floating-point numbers, as discussed in Section 80.4. Fixed-point binary numbers are generally represented using the two’s complement number system. This choice has prevailed over the sign magnitude and one’s complement number systems, because the frequently performed operations of addition and subtraction are the easiest to perform on two’s complement numbers. Sign magnitude numbers are more efficient for multiplication and division, but the lower frequency of multiplication and the development of Booth’s efficient two’s complement multiplication algorithm have resulted in the nearly universal selection of the two’s complement number system for most applications. The algorithms presented in this chapter assume the use of two’s complement numbers.

Fixed-point number systems represent numbers, for example, A, by n bits: a sign bit and n - 1 data bits. By convention, the most significant bit an -1 is the sign bit, which is 1 for negative numbers and 0 for positive numbers. The n - 1 data bits are an -2, an -3, ¼, a1, a0. In the following material, fixed- point fractions will be described for both the two’s complement and the sign magnitude number systems.

Twos Complement. In the two’s complement fractional number system, the value of a number is the sum of n - 1 positive binary fractional bits and a sign bit, which has a weight of –1

Computer Arithmetic for VLSI Signal Processing-0032

Table 80.1 compares 4-bit fractional fixed-point numbers in the two number systems. Note that the sign magnitude number system has two zeros and that the two’s complement number system is capable of representing -1. For positive numbers both systems have identical representations.

Computer Arithmetic for VLSI Signal Processing-0033

A significant difference between the two’s complement and sign magnitude number systems is their behavior under truncation. Figure 80.1 shows the effect of truncating high-precision fixed-point fractions X, to form 4-bit fractions T(X). Truncation of two’s complement numbers never increases the value of the number (i.e., the truncated numbers have values that are unchanged or shift toward negative infinity), as can be seen from Eq. (80.1) where any truncated bits have positive weight. This bias can cause an accumulation of errors for computations that involve summing many truncated numbers (which may occur in signal-processing applications). In the sign magnitude number system, truncated numbers are unchanged or shifted toward zero, so that if approximately half of the numbers to be added are positive and half are negative, the errors will tend to cancel.

Comments

Popular posts from this blog

SRAM:Decoder and Word-Line Decoding Circuit [10–13].

ASIC and Custom IC Cell Information Representation:GDS2

Timing Description Languages:SDF