Multidimensional Logarithmic Number System:Hardware Complexity
Hardware Complexity
To provide complexity results for our MDLNS inner product CU, we expand on the inner product processor architecture initially developed for the one-digit 2DLNS [14]. The processor can be used in a systolic array for 1D convolution.
Single-Digit Computational Unit
Figure 84.1 shows the structure of the proposed single-digit CU. Since we do not need to retain the MDLNS representation of the accumulated output, and since the CU is used only in feed-forward
architectures, we can use the MDLNS domain for the coefficient multiplication and a binary representation for the accumulated output.
The multiplication is performed by small, parallel adders for each of the data and coefficient base exponents. The addition output for each of the b - 1 odd bases is concatenated into an address for a lookup table (ROM). This table produces an equivalent floating-point (FP) value for the product of the odd bases raised to the exponent sum, as shown below:
We note that since the size of the exponents of each odd base in an MDLNS representation (where there are at least two digits and two bases) can be very small (<4 bits), the maximum address input to the ROM is given by 4 × (b - 1) bits. This is an 8-bit address table for a 3DLNS. For large-dimensional LNS, we can also consider the use of unity approximants to reduce the output of each odd-base adder to the number of bits of the input exponents (or even less if we are willing to accept the increased mapping error). This reduction process stores a small number of unity approximants that can be added in parallel to the output of the odd-base adders. The reduced input to the ROM is selected from these parallel results. The ROM input address size is now reduced by (b - 1) bits.
n-digit Computational Unit
The n-digit computational unit is a simple parallel extension of the one-digit unit. Each of the units computes the binary output for one of the digit combinations. As an example, consider multiplying an accumulating sequence, y, with a coefficient x, z = xy, where
Clearly there are n2 such units in an n-digit MDLNS.
The parallel outputs are summed at the end of the systolic arrays using an adder tree.
The biggest advantage of the use of more than one digit for the input data and the filter coefficients is that one can obtain extremely accurate representations with very small nonbinary exponents. But the price that has to be paid is that the number of computational channels required is increased to at least 4.
Comments
Post a Comment