Architecture and Design Flow Optimizations for Power-Aware FPGAs:Techniques Requiring Changes at Multiple Levels.
Techniques Requiring Changes at Multiple Levels
Until now we have seen techniques that affected mainly one aspect of the FPGA design. In this section, we will look at those techniques that span multiple aspects.
Using Sleep Transistors
Sleep transistors are low-leakage transistors used to cut off the supply to an unused or idle circuit block. They are widely used in deep-submicron ASICs to cut off the supply to unused portions of a design, and thus make the leakage power in those blocks close to zero. This is an effective way to reduce leakage in FPGAs too, because a large number of transistors in an FPGA remain unused for most user designs. Since the specific CLBs that can be switched off depends on the implemented design, sleep transistor insertion is trickier than in ASICs. In general, a very fine-grained insertion of sleep transistors incurs a very large area overhead. Therefore, sleep transistors are usually inserted at the block level instead of gate level. For example, in case of an FPGA, controlling every single LUT will be prohibitive due to the implementation area cost, and therefore we would like to control an entire CLB or a group of CLBs using the same sleep transistor.
Normally, the place and route tool scatters the mapped design over the FPGA array to optimize for speed (see Figure 20.8(a)). This makes it very difficult to get large leakage savings without resorting to very fine-grained leakage control. One possible way to increase the power savings for a coarse-grained leakage control is to restrict the placement of the design to a minimum number of regions (region constrained placement, RCP), where a region is defined as a group of CLBs that share the same sleep transistor (see Figure 20.8(b)). As shown in Figure 20.9, using this technique, it is possible to use very large region sizes (as large as 16 X 16) without sacrificing much on leakage savings [27]. Note that this change in placement usually degrades the timing performance of the design, and therefore, it is important that we look at the leakage energy (as shown in Figure 20.9) and not leakage power.
Dual-Supply Techniques
Most user designs have only few combinational paths that are timing-critical and which need to be run at maximum performance. The other noncritical paths can be run at a reduced performance and save power, while maintaining the overall performance of the design. One way to save power in noncritical paths is by reducing the power supply voltage to those paths. It becomes more challenging for FPGAs because the critical paths depend on the user design that is implemented on the FPGA. This prohibits the FPGA manufacturers from fixing a lower supply for portions of the device. There are two ways to circumvent this problem. The first technique fixes the supply voltages for individual CLBs at the time of fabrication of the FPGA, thus creating a mixed pool of fast and slow CLBs [28]. The complexity in this case lies in the CAD tool since it has to make sure that the timing-critical blocks in a user design are mapped to the fast CLBs. Furthermore, there may not be enough fast CLBs in the FPGA to obtain optimal performance for some designs. These difficulties hinder the use of this technique for power reduction.
The second technique uses a CLB-level configurable supply voltage (see Figure 20.10) to lower the supply for a noncritical block [28–31]. Using the circuit shown in Figure 20.10, it is possible to set the supply voltage of every CLB to either high (VDDH) or low (VDDL). The value of the supply is controlled by using two SRAM cells. Although this incurs a large area overhead (since the supply transistors must be sized large for performance), it reduces power by more than 50% compared to a single-supply FPGA. An additional advantage of this technique is that both the supply transistors can be switched off to cut off the supply completely for unused CLBs, and therefore save leakage power. This method needs CAD support to decide which CLBs can be run on a lower supply. Another complication is the need for level converters whenever a low supply block drives a high supply block. A level converter scales a VDDL value to VDDH. Without the level conversion, the PMOS in the VDDH buffer will not be completely switched off, leading to a large static current. However, level converters add area, power, and delay overheads to the
implemented design. We can reduce these overheads by minimizing the number of level converters. For example, Gayasen et al. [30] experimented with level converters only at CLB pins.
To reduce the area overhead of a dual-supply FPGA, Anderson and Najm [32] proposed a routing switch that generates two voltage levels from a single supply by using the threshold drop across an NMOS transistor (MNX in Figure 20.11). This eliminates the area penalty associated with routing two power grids. In Figure 20.11, when MPX is switched off, but MNX is kept on, the routing switch functions in this low supply mode. When both MNX and MPX are switched off, then the routing switch goes into the sleep mode. The states of MPX and MNX are controlled using configuration SRAMs.
Using Input Dependence of Mux Leakage
The above technique of using two supply voltages incurs an area penalty of the order of 50% [33], which indicates that it should be used only for applications that can tolerate some increase in cost. Since for most applications this is not the case, we need to explore ways to reduce power with less area penalty. By limiting the VDD configurability to logic blocks, Lin et al. [18] reduced the area overhead to 17%. We have already seen in Section 20.5 that optimizing the CAD-flow for low power reduced the consumption by 22.6%. We have also seen that inverting the logic implemented in some LUTs can possibly reduce leakage in the used portion of the design. This was possible because the mux leakage strongly depends on the values of its inputs.
Now, we look at another technique that requires small changes in the FPGA circuits to allow the unused inputs of routing muxes to be set at desired values, and hence, reduce leakage. Figure 20.12 shows the modified circuits for the routing muxes [34]. Figure 20.12(a) shows an obvious way to control the inputs of a mux. It uses two configuration bits at every input, which select between “0” and “1.” When both are not selected, the user signal is used as input. Although this circuit provides the maximum flexibility, its large area overhead makes it impractical to be used in a real FPGA. This area penalty can be reduced by observing that since most of the inputs of the multiplexers are driven by other muxes, it is sufficient to set the mux outputs to desired values. Figure 20.12(b) shows an area-efficient implemen- tation of such a method, where the reset mechanism ensures that all the undriven signals are pulled up to a logic state “1.” Figure 20.12(d) shows a similar circuit where the mux output is set to “0” instead of “1.” Finally, Figure 20.12(c) shows a circuit to set the mux output at either “0” or “1,” depending on what is desired for least leakage. With this capability to set unused segments at desired values, the CAD tool can decide on the optimal configurations of muxes and unused LUTs to reduce leakage.
Comments
Post a Comment