Architecture and Design Flow Optimizations for Power-Aware FPGAs:Architecture-Level Power Optimization.

Architecture-Level Power Optimization

An FPGA can be broadly divided into logic and routing portions. Since different parameters characterize the architectures of these two portions, we will look at these separately.

The FPGA logic architecture is defined by the LUT size, the cluster size, and by any nonstandard logic components used. While all these architectural factors influence the power consumption, most studies have focused on LUT and cluster sizes.

LUT size affects the FPGA power in several ways. We observe that the power consumed in a single LUT increases with LUT size—for the same function, a six-input LUT dissipates more power than a four- input LUT. However, the number of LUTs needed to implement a design decreases as the LUT size increases. Consequently, there is an optimal LUT size that gives the lowest logic power for the entire FPGA. Note that LUT size influences the number of nets in the routing fabric, and therefore, indirectly, affects the interconnect power too. This implies that deciding the best LUT size demands estimating the total power and not just the logic power. Such a detailed exploration has shown that a LUT size of 4 inputs consumes the least power as well as energy [18]. Interestingly, a size-4 LUT was earlier shown to be the most area efficient as well [19].

The next logic architecture parameter we will discuss is cluster size. Clustering of LUTs has several advantages. Since the intracluster connections are much faster than the intercluster ones, a timing-driven clustering algorithm can pack the LUTs on the critical path into the same cluster. Furthermore, clustering reduces the problem size for the placer, and therefore, improves its run time as well as performance. Clustering can also help to reduce the load capacitance, and consequently power, for short connections that get absorbed within the cluster. Li et al. [8] explored different clusters consisting of 4, 8, or 12 LUTs, and concluded that a 12-LUT cluster gives the lowest power-delay product. This happens because the larger cluster size helps reduce the interconnect resources between logic blocks, and therefore, reduces interconnect power.

The FPGA routing architecture is described by the lengths of the segments, the connection flexibil- ities, and the types of routing switches used. For example, a Virtex-2 FPGA uses segments of lengths 1, 2, 6, and long (spanning entire row or column), with all of the routing switches being buffered. The connection flexibility for a switch block is defined as the number of connections per segment in the switch block. In Virtex-2, this number varies depending on the segment, and is not a constant for the switch box.

Segment length affects both delay and power of the nets. Short segments lead to more hops in the routing of a net. The switches connecting these segments add resistance and capacitance to the net’s route, and therefore, increase the net’s delay and power. In contrast, a very long segment may force the net to use a longer route, which again increases the capacitance of the net, and consequently the delay as well as power. An optimal segment length would minimize the routing energy, and most likely, an architecture containing multiple segment lengths would be better than one with all segments of the same length. We refer the reader to Vassiliadis et al. [20] for such an exploratory study.

Another parameter defining a routing architecture is the type of routing switches used to connect segments to each other. Li et al. [8] explored three kinds of routing architectures. In the first architecture, 50% of the routing switches used buffers, and the other 50% used pass transistors. In the second, all the routing switches used pass transistors, and in the third, all of them used buffers. The first architecture gave the lowest power and power-delay product.

The reader should be warned that these architectural results depend very strongly on the technology and circuits used to implement the FPGA. Quite possibly, the energy-optimal points will change when the implementation technology is changed, or a different circuit style is adopted.

Comments

Popular posts from this blog

SRAM:Decoder and Word-Line Decoding Circuit [10–13].

ASIC and Custom IC Cell Information Representation:GDS2

Timing Description Languages:SDF