Computer Arithmetic or VLSI Signal Processing:Fixed-Point Number Systems

By Ahmed Farahat - October 14, 2015

Introduction

Until the 1970s nearly all signal processing was done with analog circuits. Now it is mostly digital. The change is the result of two technology developments: mixed-signal (analog[A] and digital[D]) circuits (mostly, A/D and D/A converters) and VLSI arithmetic elements (mostly, adders and multipliers).

This chapter examines the arithmetic elements at the algorithm and logic design levels. Different algorithms may be used to vary the speed of an arithmetic element by an order of magnitude or more while the complexity varies by <50%.

This chapter examines number systems for signal processing in Section 80.2. The implementation of ﬁxed-point arithmetic elements are examined in Section 80.3. Finally, Section 80.4 brieﬂy considers the implementation of ﬂoating-point arithmetic elements.

Regarding notation, capital letters represent digital numbers (i.e., words), whereas subscripted lowercase letters represent bits of the corresponding word. The subscripts range from n - 1 to 0 to indicate the bit position within the word (xn-1 is the most signiﬁcant bit of X, x0 is the least signiﬁcant bit of X, etc.). The logic designs in this chapter are based on positive logic with AND, OR, and invert operations. Depending on the technology used for implementation, different operations (such as NAND and NOR) may be used, but the basic concepts do not change.

Fixed-Point Number Systems

In digital signal processing, most arithmetic is performed with ﬁxed-point binary numbers that have constant scaling (i.e., the position of the binary point is ﬁxed). The numbers can be interpreted as fractions, integers, or mixed numbers, but fractions are the most commonly used.

Pairs of ﬁxed-point numbers are used to create ﬂoating-point numbers, as discussed in Section 80.4. Fixed-point binary numbers are generally represented using the two’s complement number system. This choice has prevailed over the sign magnitude and one’s complement number systems, because the frequently performed operations of addition and subtraction are the easiest to perform on two’s complement numbers. Sign magnitude numbers are more efﬁcient for multiplication and division, but the lower frequency of multiplication and the development of Booth’s efﬁcient two’s complement multiplication algorithm have resulted in the nearly universal selection of the two’s complement number system for most applications. The algorithms presented in this chapter assume the use of two’s complement numbers.

Fixed-point number systems represent numbers, for example, A, by n bits: a sign bit and n - 1 data bits. By convention, the most signiﬁcant bit an -1 is the sign bit, which is 1 for negative numbers and 0 for positive numbers. The n - 1 data bits are an -2, an -3, ¼, a1, a0. In the following material, ﬁxed- point fractions will be described for both the two’s complement and the sign magnitude number systems.

Two’s Complement. In the two’s complement fractional number system, the value of a number is the sum of n - 1 positive binary fractional bits and a sign bit, which has a weight of –1

Table 80.1 compares 4-bit fractional ﬁxed-point numbers in the two number systems. Note that the sign magnitude number system has two zeros and that the two’s complement number system is capable of representing -1. For positive numbers both systems have identical representations.

A signiﬁcant difference between the two’s complement and sign magnitude number systems is their behavior under truncation. Figure 80.1 shows the effect of truncating high-precision ﬁxed-point fractions X, to form 4-bit fractions T(X). Truncation of two’s complement numbers never increases the value of the number (i.e., the truncated numbers have values that are unchanged or shift toward negative inﬁnity), as can be seen from Eq. (80.1) where any truncated bits have positive weight. This bias can cause an accumulation of errors for computations that involve summing many truncated numbers (which may occur in signal-processing applications). In the sign magnitude number system, truncated numbers are unchanged or shifted toward zero, so that if approximately half of the numbers to be added are positive and half are negative, the errors will tend to cancel.

Search This Blog

Integrated circuit course

Computer Arithmetic or VLSI Signal Processing:Fixed-Point Number Systems

Comments

Post a Comment

Popular posts from this blog

SRAM:Decoder and Word-Line Decoding Circuit [10–13].

Adders:Carry Look-Ahead Adder.

Timing Description Languages:SDF